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Abstract 


The  statistical  analysis  of  dynamie  stability  failures  of  ships  is  made  extremely 
diffieutl  due  to  the  problem  of  rarity.  Few  or  no  events  of  interest  may  be  observed  in  the 
amount  of  time  that  is  feasible  for  model  testing  or  even  simulation.  The  Envelope  Peaks 
Over  the  Threshold  (EPOT)  method  is  a  statistical  extrapolation  technique  that  was 
developed  to  address  this  problem.  It  uses  the  principle  of  separation  to  decompose  the 
problem  into  rare  and  non-rare  sub-problems.  The  lion-rare  problem  is  solved  trivially 
with  direet  statistics,  while  the  rare  problem  is  solved  by  fitting  a  distribution  to  the  peaks 
or  maxima  over  a  threshold.  The  notion  of  statistical  condfidence  it  carried  through  the 
whole  process.  The  algorithm  and  principles  behind  the  algorithm  are  defined  and 
explored  in  detail. 
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Introduction 


Ships  of  novel  hull  form  shapes  may  be  vulnerable  to  dynamic  stability  failures  as 
currently  existing  stability  standards,  which  are  based  on  previous  experience,  do  not  include 
these  unconventional  shapes.  Some  of  the  failures  related  to  dynamic  stability  are  caused  b> 
irregular  waves  and  gusty  wind.  The  inherent  randomness  of  these  environmental  factors 
makes  the  probability  of  stability  failure  a  very  useful  measure  for  both  design  and  operation 
This  work's  particular  focus  is  partial  stability  failure  related  to  large  roll  angle  events  caused 
mostly  by  pure  loss  of  stability  on  the  wave  crest. 

Difficulties  evaluating  the  probability  of  large  roll  angles  are  related  to  both  the  rarity 
of  the  event  and  the  nonlinearity  of  the  dynamical  system  describing  the  motion  of  a  ship  in 
moderate  to  severe  seas.  The  nonlinearity  of  the  dynamical  system  comes  mostly  from 
nonlinear  stiffness  that  also  could  be  a  random  quantity  due  to  changing  stability  in  waves: 
however  other  terms  (including  yaw  moment  and  roll  damping)  also  make  their  contribution. 
As  these  nonlinearities  are  essential  to  the  problem,  options  for  realistic  assessment  may  be 
limited  to  numerical  simulations  using  advanced  potential  codes  and  model  tests. 

The  rarity  of  the  large  roll  event  defines  another  set  of  requirements  for  the  solution 
of  this  problem.  Conditions  are  possible  when  the  large  roll  event  is  not  observed  during  the 
available  run-time  of  the  simulation  or  model  test.  Other  conditions  may  lead  to  very  few 
observed  large  roll  events  so  that  use  of  direct  statistical  counting  cannot  be  considered  as  a 
practical  option.  Therefore,  the  solution  must  be  an  extrapolation. 

Probabilistic  or  statistical  extrapolation  is  widely  used  in  technology;  prediction  of 
extreme  events  utilizes  extreme  value  theory.  This  type  of  methodology  is  based  on  the 
extreme  value  distribution  being  fit  to  statistical  data;  then  the  distribution  can  be  used  to 
predict  an  extreme  value  that  can  occur  with  a  given  probability  or  mean  time  that  passes 
before  such  an  extreme  value  is  observed.  Mathematical  background  of  these  methods  comes 
from  theorems  of  the  extreme  value  theory  stating  that  a  maximum  of  a  random  variable  has 
a  limiting  distribution  that  does  not  depend  on  type  of  distribution  of  this  variable. 

The  problem,  with  straight-forward  application  of  this  method,  is  related  with  the 
significant  nonlinearity  of  the  dynamical  system.  A  sample  of  roll  motions  resulted  both  from 
experiment  or  numerical  simulation  is  statistically  dominated  by  relatively  small  roll  angles 
As  the  stiffness  of  the  dynamical  system  is  significantly  nonlinear,  properties  of  the  system 
change  significantly  with  the  roll  angle.  Therefore,  extreme-value  distribution  fitted  with  all 
the  data  may  not  be  representative  of  the  properties  of  the  system  at  large  roll  angles.  This 
problem  is  generally  known  in  the  extreme  value  statistics;  its  standard  solution  is  a  “peak 
over-threshold"  (POT)  method.  The  idea  is  to  use  data  which  exceeds  a  certain  threshold. 

Interpretation  and  adaptation  of  the  POT  method  for  probabilistic  evaluation  of 
dynamic  stability  is  the  main  core  of  this  work.  The  most  principle  aspects  are  as  follows: 

•  Relation  with  time  of  exposure;  probabilistic  measure  of  dynamic  stability  should 
have  explicit  relation  with  time.  The  most  natural  way  is  to  present  it  in  a  form  of  the 
rate  of  failures  (average  number  of  failures  per  unit  of  time;  it  equals  to  inverse  value 
of  a  mean  time  bcforc/bctwccn  failures).  This  allows  using  Poisson  flow  to  express 
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probability  of  failure  during  given  time  of  exposure  (assuming  stability  failures  being 
independent  random  events). 

•  Statistical  uncertainty;  considered  probabilistic  measure  of  dynamic  stability  is 
calculated  based  on  a  finite-size  dataset.  That  makes  the  measure  a  random  value  and 
it  has  to  be  treated  accordingly.  Confidence  interval  is  a  standard  way  to  handle 
statistical  uncertainty  of  a  value  derived  from  a  finite-size  sample.  Statistical 
uncertainty  is  also  a  major  faetor  ehoosing  a  numerical  value  for  the  threshold. 

•  Cheek  of  eonsisteney  and  convergence;  correct  interpretation  of  the  POT  method 
needs  to  be  checked  by  comparison  with  other  methods,  such  as  uperossing  theory. 
Convergence  also  can  be  tested  by  comparison  with  the  results  of  dircet  counting 
using  a  sample  of  larger  size. 

Practical  application  of  the  method  is  meant  for  partial  stability  failures.  This  event 
oeeurs  when  a  ship  encounters  large  roll  (or  pitch)  angle  that  may  be  dangerous  for  crew  or 
equipment  on  board.  That  means  that  a  roll  angle  may  be  dangerous  on  either  side,  port  or 
starboard,  so  exceedance  on  both  sides  constitute  stability  failure.  As  roll  (and  pitch)  motions 
have  certain  inertia,  a  large  amplitude  on  one  side  is  likely  to  be  followed  by  a  large 
amplitude  on  the  other  side.  This  make  events  statistically  dependent  and  may  create  a 
problem  with  the  application  of  Poisson  distribution  and  therefore  with  an  explicit 
relationship  with  time.  To  avoid  this  complication,  an  envelope  may  be  considered  instead  of 
the  original  process.  So  the  envelope  peaks-ovcr-the-thrcshold  is  actually  used  for 
extrapolation. 

In  summary,  this  work  is  focused  on  the  application  of  the  Envelope  Pcak-Over- 
Threshold  (EPOT)  method  for  the  probabilistic  evaluation  of  dynamie  stability  using  a 
dataset  originated  from  numerical  simulation  or  model  experiment.  The  method  can  be  tuned 
to  handle  the  nonlinearity  of  the  dynamical  system.  The  method  provides  an  explicit  relation 
with  time  of  exposure  and  eomes  with  a  eonfidenee  interval  as  a  measure  of  statistical 
uncertainty.  Finally,  the  method  is  meant  to  be  tested  for  eonsisteney  against  other  theoretical 
methods  and  for  convergence  against  the  result  of  direct  counting  on  a  larger-sizc  sample. 

The  original  idea  to  use  peak-over  threshold  as  a  method  to  treat  the  problem  of  rarity  in 
a  nonlinear  dynamical  system  belongs  to  author  B.  Campbell.  He  also  proposed  to  use  an 
envelope  as  a  means  to  evaluate  the  rate  of  cxccedanees  of  both  sides  while  keeping 
applicability  of  Poisson  flow.  Author  V.  Belenky  provided  theoretical  justification  and  initial 
numerical  testing  of  the  method.  Numerical  implementation  of  the  method  is  a  result  of  joint 
efforts  of  the  authors.  Seetion  1  through  5  arc  written  by  V.  Belenky;  Section  6  is  written  by 
B.  Campbell. 
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1.  Theoretical  Background 


1.1. Relation  with  Time 

Here  \vc  review  the  available  formulations  for  relating  the  probability  of  the 
occurrence  of  a  large  roll  event  with  a  time  of  exposure. 

/.  /.  /.  Introduction  of  Time  -  Binomial  Distribution 

A  fundamental  building  block  of  the  probability  of  event  occurrence  is  the 
connection  between  time  duration  and  the  number  of  events  likely  to  be  seen.  Consider  n 
instants  of  time  of  short  duration  At.  Assume  that  an  c\cnt  (i.c.  a  large  roll  angle)  may 
happen  at  an  instant  of  time  /,  with  probability  p,.  Also  assume  that  if  there  is  more  than 
one  large  roll  event,  they  can  be  considered  as  independent  random  events.  This 
assumption  can  be  justified  by  the  expected  rarity  of  large  roll  events  and,  therefore,  the 
sufficient  time  is  expected  to  pass  between  two  subsequent  events  to  eliminate  any 
dependence. 

Consider  the  probability  that  an  event  occurs  exactly  at  /- th  time  step.  This 
implies  that  the  event  does  not  happen  in  all  other  instants  of  time. 

= '/)  =  i cl„ :  </,=•“ P,  (ID 

It  is  clear  from  the  formula  (1.1)  that  probability  P  depends  on  how  many  time 
instants  arc  included,  therefore  this  formula  already  expresses  the  relationship  between 
probability  and  time. 

If  the  conditions  during  the  exposure  time  under  assessment  can  be  considered 
unchanged  (that  is,  the  process  is  stationary),  then  there  is  no  difference  between  any  two 
instants  of  time;  therefore  probability  p ,  must  be  the  same  for  all  instants  of  time.  This 
allows  re-writing  formula  (1.1): 

P(t  =  /,)  =  pq"~'  ;  </  =  l  ~P  (1-2) 

Note  that  the  probability  in  the  formula  (1.2)  docs  not  depend  at  which  instant  of 
time  the  event  has  occurred,  but  it  still  must  be  a  particular  instant  of  time 

Consider  the  probability  that  the  event  occurs  once  at  any  instant  of  time.  This 
means  that  it  can  occur  in  the  1st  instant  or  in  the  2'1'  instant  and  further  on.  There  are 
exactly  n  possible  scenarios  how  the  event  can  occur  once  during  n  time  steps. 

P(m  =  1)  =  npq"  1  (1.3) 

Where  m  is  the  number  of  events  that  occur  in  the  length  of  the  record. 

Consider  the  probability  of  two  events  happening  exactly  at  the  instants  i  and./. 
Following  the  logic  of  formula  ( 1 .2): 

P(m  =  2  \t  =  /,./  =  /, )  =  p'-q"  2  (I  4) 

To  express  the  probability  that  two  events  occur  at  any  two  moments  of  time,  it  is 
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necessary  to  find  all  possible  combinations  of  how  two  elements  can  be  chosen  from  n. 
The  formula  for  such  a  value  is  known  from  combinatorics: 


C(n,2)  = 


2(n-2)\ 


(1.5) 


The  probability  of  the  event  occurring  at  any  two  instants  of  time  can  be 
expressed  as: 

P(m  =  2)  =  C(n,2)p2q"  2  (1.6) 

Generalizing  formula  ( 1 .6)  for  the  ease  when  the  event  occurs  k  times  at  any  instant: 


P(k)  =  C(n,k)p  q 


k  H  k 


(1.7) 


Here  C(n,  k )  defines  a  number  of  combinations  of  how  k  values  can  be  chosen  out  of  n. 
The  general  formula  is  also  available  from  combinatorics: 


C(n,k)  = 


n\ 


k\(n-k)\ 


(1.8) 


Finally,  the  probability  that  exaetly  k  events  occur  during  the  time  of  exposure 
represented  by  n  time  steps  is: 


m)=- 


tv. 


ya-p) 


n-k 


k\(n  —  k)\ 

Formula  (1.9)  is  known  in  probability  theory  as  the  binomial  distribution. 


(1.9) 


/.  1.2.  Probability  of  Event  -  Upcrossing  Theory 

We  now  consider  the  transition  from  discrete  time  steps  to  continuous  time  and  let 
At  approach  zero.  We  want  to  find  the  probability  of  an  event  occurring  at  time  instant  /. 
We  ean  approach  this  by  considering  the  underlying  proeess,  such  as  the  rolling  of  a  ship. 
A  large  roll  event  is  defined  as  the  exeeedanee  of  some  level  a  by  roll  angle  <f>.  Consider 
the  proeess  of  roll  angle  as  differentiable  process  with  known  joint  distribution  of  roll 
angle  and  roll  rate,  /(()>,<)>) . 

Exceeding  the  level  a  at  time  instant  t  ean  be  expressed  in  the  form  of  the 
following  system  of  inequalities: 


[<(>(/)<  a 

[  4>(/  +  dt)  >  a 


(1.10) 


As  the  proeess  of  roll  angle  is  differentiable: 

m  <  ° 

1 4>(/)  +  <j>  ■  dt  >  a 

Obviously,  the  system  of  inequalities  (1.11)  can  only  be  satisfied  if  the  roll  rate  is 


(l.H) 
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positive.  Therefore  adding  the  condition  of  positive  roll  rate  does  not  change  the  system 
of  inequalities  ( 1 .  1 1 ): 

<t>(/)  <  <-> 

<  +  <jw//  (1.12) 

<j)  >  0 

The  probability  of  the  event  occurring  at  the  instant  t  is  expressed  as  the  probability  ol 
satisfying  the  system  of  inequalities  (1  12)  and  the  probability  can  be  written  as  the 
integral  of  the  joint  distribution  of  roll  and  roll  rate: 

a  x 

p=  J  J./'(<|>,<j>)dMt>  (1.13) 

.j  0 

The  limits  of  the  first  integral  arc  infinitely  close  to  each  other.  The  mean  value  theorem 
allows  re-writing  the  equation  ( 1 . 13)  as  follows: 

x 

p  =  dt  jk/<j)  (1.14) 

0 

Formula  (1.14)  shows  that  the  probability  of  the  large  roll  event  occurring  at  time  instant 
/  becomes  infinitely  small  if  the  time  is  considered  continuous: 

lim  p  =  lim  p  =  dp  I  | 

V  >0  «-**.' 

Therefore,  the  nomenclature  dp  is  more  appropriate  for  the  formulae  (1.13)  and  (1.14)  in 
this  ease: 

x 

dp  =  dt^f(a,§)§d§  (1.16) 

0 

Then  integral  in  formula  (1.16)  has  a  meaning  of  derivative  of  the  instantaneous 
probability  of  the  event  with  respect  to  time: 

^  =  f/(</.<j))<i>4  =  M')  (1.17) 

dt  ^ 

Here  X  is  the  rate  of  events.  In  general,  it  is  function  of  time. 

lim p  =  lim  p  =  dp  =  X(t)dt  ( I  |  j.  ■ 

V -^0  n-*®  ' 

If  the  process  of  roll  is  stationary,  the  rate  of  events  becomes  constant  and  formula  (1  17) 
can  be  simplified,  as  the  first  derivative  a  of  stationary  process  is  independent  oi  the 
process  itself. 
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,A<M)  =  ./W(<t>) 

and  formula  (1.17)  becomes: 


*■  =  “  =  /(<») 


(1-19) 


(1.20) 


/.  1.3.  Continuous  Time  -  The  Poisson  Distribution 

Continuing  the  transition  to  continuous  time,  we  set  At  infinitely  small,  which 
means  n=<x  and  look  for  the  probability  that  there  would  be  exactly  k  events  during  time 
T: 


P7  ( k )  =  lim(m))=  Ym[C(n,k)pk (1  -  p)" k ) 


(1.21) 


Taking  into  account  formula  (1.20),  the  probability  of  the  event  occurring  at  time 
/  can  also  be  expressed  in  terms  of  discretized  time: 


p  =  XAt  = 


XT 


(1.22) 


Substituting  formula  ( 1 .8)  into  (1.21)  and  expanding  some  of  factorials,  we  obtain: 

Pr  <*)  -  liraf  ‘ ' 2 ' "  ~  +  ^ ”  ~  0  ~ (I  -  p)-*  ^ 

at!(1>  2  -...(n-k)) 

After  dividing  the  numerator  and  denominator  by  (l  -  2  -  ...(/7  -  A-)): 

A! 


(1.23) 


(1.24) 


Substitution  of  fonnula  ( 1 .22)  into  ( 1 .24)  yields: 

/ 

PT(k)-  lim 


(n  —  k  + !)..(«  - 1)  •  n 


k\ 


XT 


»  J 


XT' 

n  J 


n  k  \ 


=  lim 


r  (n  —  k  +  1)..(a?  —  1)  -n 

( 

1 

XT) 

k 

f  (xtY  ] 

k.T ) 

n  \ 

1 

v 

n 

k\  J 

n  ) 

J 

(1.25) 
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Now  consider  the  limits  of  each  factor: 


I  ini 


(/;  -  A  +  !)..(/?  -  1)  •  k 


=  1 


lim 


fl--l 

k  \ 

v  n 

) 

/( XT)*  )  (lT)k 


lim 


A! 


A! 


lim 


XT 


< »  A 


-  exp(-Xr) 


( I  26) 

(1  27) 

(1  28) 

(1  29) 


n  ) 


Taking  into  aeeount  equations  ( 1 .26)—  ( 1 .29),  formula  ( 1 .25)  can  he  presented  as: 


PT(  A) 


1  = 


A! 


e.\p(-  XT) 


(1  30) 


Equation  ( 1.30)  is  known  as  the  Poisson  distribution.  It  expresses  probability  of  exactly  A 
events  occurring  during  the  exposure  time  7  for  a  continuous  proeess. 


1. 1.4.  Time  Before/Between  Events 

Consider  the  cumulative  distribution  function  (CDf)  of  time  before  the  first  event 
or  between  eonseeutive  events.  As  an  event  may  oeeur  at  any  instant  of  time,  the  interval 
between  them  is  a  random  variable.  This  interval  also  ean  be  defined  as  a  time  while  no 
events  oceurs. 

Consider  CDF  of  the  time  while  no  event  oceurs.  By  the  definition,  this  is  the 
probability  that  a  random  variable  is  less  than  or  equal  to  an  argument. 

F,  (x)  =  P(T<  x)  (131) 

The  CDF  also  can  be  expressed  through  the  probability  of  a  complimentary  event: 

Ft  (.v)  =  \-P(T>  x)  (132) 

If  no  event  oeeurs  then  the  time  between  them  is  obviously  larger  than  the 
argument.  Therefore,  this  is  a  probability  that  no  event  oeeurs  during  time  T  It  can  be 
found  from  formula  ( 1 .30)  by  setting  A  to  0: 

P(  v  >  7)  =  PT(k  =  0)  =  exp(-  XT)  (1  33) 

Formula  ( 1 .32)  ean  then  be  expressed  as: 

Ft(T)  =  \—  exp(—  A7 )  ;  7>0  (134) 

This  CDF  is  known  in  probability  theory  as  the  exponential  distribution.  The 
probability  density  function  (PDF)  of  the  exponential  distribution  can  be  found  by 
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(1.35) 


differentiating  the  CDF,  equation  (1 .34)  with  respect  to  its  argument: 

f,iT)  =  ^P-  =  -AcM-^T)  :  T>0 

a! 

The  exponential  distribution  has  a  single  parameter  X,  the  rate  of  events.  The 
inverse  of  X  is  the  mean  value  and  standard  deviation  of  the  time  without  an  event. 

x 

m(T)=  $TfT(T)dt  =  (1.36) 

0 

x 

YiT)=\(T-m(T))2MT)dt  =  y2  ;  <r(T)  =  /A  (1.37) 

0 

Here  m{T),  C(D  and  a(7)  are  the  mean  value,  varianec,  and  standard  deviation  of  time 
between  /  before  the  event,  respeetively. 

It  is  possible  to  demonstrate  that  formula  (1.34)  also  can  be  interpreted  as  the 
probability  of  at  least  one  event  (one  event,  two  events  or  more)  occurring  in  time 
duration  T.  Expressing  this  probability  through  the  probability  of  complimentary  event 
yields  expression  identical  to  ( 1 .34) 

Pr (A  *  0)  =  \-Pr(k  =  0)  =  l-exp(-Xr)  (1.38) 


1.2. Statistical  Evaluation  of  Upcrossing  Rate 

We  now  shift  our  focus  to  the  statistical  estimation  of  the  rate  of  upcrossing 
events  (upcrossing  rate),  including  an  appropriate  confidence  interval.  We  also  wish  to 
relate  the  upcrossing  rate  to  statistics  of  other  related  parameters,  including  the  time 
between  events  and  the  time  before  the  first  event. 


1.2. 1.  Statistical  Estimate  of  the  Parameter  of  the  Distribution 


Consider  a  sample  of  a  stochastic  process  .v,  presented  in  the  form  of  an  ensemble 
of  Nr  records.  Eaeh  reeord  is  represented  by  a  time  history  of  Npr  points  with  the  time 
step  At  and  n-  Nn~\  time  steps.  Then  the  event  of  uperossing  of  the  level  a  can  be 
associated  with  a  random  variable  U  defined  for  each  time  step  as  follows: 


xiJ<anxM,j>a 

Otherwise 


i=\,..,n;  j  =  \,...,Nr 


(1.39) 
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The  number  of  upcrossing  events  occurring  at  the  /h  time  step  of  each  record  is  calculated 
as: 


i=\ 

The  estimate  of  the  probability  that  the  upcrossing  event  occurs  at  time  instant  I,  is 


(I  41) 


This  value  is  a  statistical  estimate  of  infinitely  small  probability,  d /;,  as  introduced  in 
formula  (1.15): 


dp(f  =  /,)=  lim  />’ 

\K  >X 


(1  42) 


\  H  >  r 
V  >0 


Then,  following  the  formula  (1.22)  the  estimate  of  A.’  ,  the  parameter  of  the  Poisson 


distribution  estimated  at  the  /"’  time  step,  is  expressed  as: 


(I  43) 


If  process  .v  is  stationary,  the  parameter  of  the  Poisson  distribution.  A,  does  not  depend  on 
time  and  the  value  A*  estimated  for  different  time  steps  tends  to  the  same  limit  with 

increasing  number  of  records.  If  the  process  .v  can  also  be  considered  ergodic  (that  is.  the 
statistical  characteristics  of  x  can  be  estimated  from  one  record  if  it  is  long  enough),  then 
the  estimates  of  p'  and  A’  can  be  evaluated  using  all  the  time  steps: 


(I  44) 


Regrouping  formula  ( 1 .43).  we  sec  that  it  contains  the  number  of  events  in  each  record: 


The  value  /V<  is  a  random  number  represented  by  its  sample.  The  volume  of  this  sample 


the  number  of  records  Nr.  The  mean  value  of  this  random  number  is  estimated  as: 


Substitution  of  formulae  ( 1 .45)  and  ( 1 .46)  into  equation  ( 1 .44)  yields: 


(I  46) 


(1.47) 


/  '  =  m{ A.*  )  = 


m, 


m, 


nAt  Td 


Where:  Tr  is  time  duration  of  a  record. 

Formula  (1.46)  also  allows  an  interpretation  of  the  meaning  of  the  X  parameter  of  the 
exponential  distribution;  that  is  the  average  number  of  crossings  per  unit  of  time  also 
known  as  the  “rate  of  events”. 


1.2.2.  Confidence  Interval  for  Rate  of  Events 

Formula  (1.47)  also  reveals  the  statistical  meaning  of  the  rate  of  events.  It  is  a 
value  proportional  to  the  average  estimate  of  the  number  of  events  in  each  record. 
Therefore,  in  order  to  calculate  the  confidence  interval  for  the  rate  of  events,  it  is  enough 
to  find  the  confidence  interval  for  m[  ■ 


The  value  m{  is  an  estimate  of  the  mean  of  the  random  variable  Ny.  This  variable 

has  the  binomial  distribution,  as  it  represents  the  number  of  upcrossings  occurring  within 
a  finite  number  of  time  steps.  As  the  number  of  events  is  countable,  this  random  variable 
is  discrete  and  it  is  defined  by  the  following  probability  mass  function  (discrete 
counterpart  of  PDF) 

m=T^/(-'-pr>  <l48> 

Where  k  is  a  number  of  crossings  observed  during  the  time  duration  of  a  record.  In  this 
case,  Nuj  represents  a  realized  sample  of  k.  The  theoretical  mean  value  mu  and  variance 
Va  of  the  binomial  distribution  (1 .48)  are  known: 

mv=np\  Vv  =np(\-p)  (1.49) 

The  estimate  of  the  mean  value  (1.46)  is  a  random  number  as  it  is  a  sum  of  Nr  random 
variables,  each  of  which  has  the  binominal  distribution  with  the  same  parameter.  The 
distribution  of  the  mean  value  estimate  is  important  to  the  method  of  evaluating  the 
confidence  interval. 


Independent  of  is  distribution,  the  variance  of  m]  can  be  found  as  a  variance  of  a 
sum  of  independent  variables: 


,v,  \ 

L«K) 

J 


A rjVy  V_u 
Nl  Nr 


(1.50) 


As  the  exact  value  of  the  variance  Vu  is  not  known,  we  substitute  an  estimate  for  it: 
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(1.51) 


As  equation  (1.46)  represents  the  sum  of  identically  distributed  random  variables,  the 
distribution  ot  the  value  ni  tends  to  normal,  as  sample  size  Nr  increases  (Central  Limit 

Theorem).  Belenky,  et  al  (2007)  used  this  approach  to  evaluate  the  confidence  interval 
and  found  that  the  distribution  of  the  estimate  is  expressed  as: 


/(;;?)  = 


^271  r(m*  ) 


exp 


{m  -  m{mL  )) 

2>'k ) 


(152) 


It  is  known  that  the  mean  value  estimate  is  unbiased;  therefore,  the  mean  value  ot  the 
estimate  is  equal  to  itself: 

w{m[  )=  m[,  (1  53) 

The  upper  and  lower  boundaries  of  the  confidence  interval  UB  and  LB  must  satisfy  the 
following  equation: 

LB  in 

P=  J  f(m)thn  =  F(UB)-F{LB)  ;  F(ro)  =  \f(z)Jz  (1.54) 

I.B  -» 


where  p  is  the  accepted  confidence  level  (typically  90%  or  95%)  and  F(m)  is  the  CDF 
To  find  the  boundaries,  the  inverse  function  of  the  CDF.  0(P ),  (also  known  as  the 
Quantile  function)  is  introduced: 

m  =  Q(P)  —  Inv{F(m)} ;  P  =  /•’(»?)  (155) 


Where  P  is  a  probability.  The  inverse  function  returns  the  value  corresponding  to  that 
probability.  Then: 


LB  =  Q 


-P 


UB  =  Q 


!-pW 


p 


2 


(1.56) 


Recognizing  that  the  normal  distribution  is  symmetric  around  its  mean  value,  the  half¬ 
breadth  of  the  confidence  interval  can  be  expressed  as: 


LB()tit  )  =  mL  -  e  ;  UB(ml  )  =  mL  +  e  :  r,  =  (J 


+  P 


Here  e  is  the  half-breadth  of  the  confidence  interval.  Typical  values  arc: 


e  -  •  a 


P  =  0.95  ;  =1.959964 

p  =  0.9973;  Kp=  3.0 


(1  57) 


Here  a  is  the  standard  deviation  of  the  variable;  the  latter  pair  of  values  is  also  commonly 
referred  as  the  “six-sigma  rule”. 
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Following  (Belenky,  et  al,  2007),  boundaries  of  the  confidence  interval  for  the 
rate  of  events  A*  are  defined  as: 


T« 


UB(  m{, ) 

Tk 


(1.58) 


Use  of  the  normal  distribution  (1.52)  is  only  justified  if  Nr  is  large  as  it  is  based  on  the 
Central  Limit  Theorem.  The  question  of  how  large  Nr  should  be  to  employ  the  normal 
distribution  may  require  additional  study. 


At  the  same  time,  the  sum  of  independent  variables  with  a  binomial  distribution, 
all  with  the  same  parameter  //,  also  have  a  binomial  distribution  with  the  same  parameter 
/?,  but  with  a  sum  of  the  number  of  eases.  If  the  number  of  eases  are  the  same  then  the 
new  number  of  eases  beeomes  2/7.  In  the  ease  of  Nr  records,  the  total  number  of  eases 
beeomes: 


N  =  N R  ■  n  (1.59) 

Then,  the  probability  that  Nr  reeords,  each  with  n  time  steps,  will  contain  A  uperossings 
ean  be  expressed  as: 

P(k)  = - — - pk(\-pf~k  (160) 

kl(N-k)\  '  ’ 

The  formula  (1.60)  also  can  be  interpreted  as  the  probability  mass  distribution  for  the 
number  of  uperossings  for  all  the  reeords: 


f(k )  = 


N\ 


A!(jV  -  A) 


yo -/>)** 


The  mean  value  for  this  distribution  is: 


msu  =  Np 


(1-61) 

(1.62) 


Taking  into  aeeount  (1.44),  the  estimate  for  this  mean  value  is  equal  to  the  number  of 
crossings  that  were  actually  observed: 


//? 


,w  =  NP  =  Nr"P  =  HU,.j 

M  y=l 


(1.63) 


The  variance  of  the  number  of  crossings  according  to  distribution  (1 .60)  ean  be  expressed 
as  follows: 

Vsv  =  NP(]  -  p)  (1.64) 


The  estimate  of  this  variance  is  related  to  the  {/,*  estimate  of  the  variance  of  the  number 
of  crossings  during  one  reeord,  as  defined  in  (1.49)  and  v(m'L  ),  the  estimate  of  the 
variance  of  the  mean  number  of  crossings  per  record  from  equation  (1.51): 
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(I  65) 


Vsl  =  Np '  ( I  -p)=  NHnp(  i  -  p)  =  NX  =  A 'l  y(»h  ) 


The  boundaries  of  eonfidenee  interval  for  the  observed  number  of  crossings  for  all  the 
records  arc  still  define  by  formula  (1.56),  interpreting  Q  as  the  inverse  function  of 
binomial  CDF  of  the  number  of  crossing  for  all  the  reeords.  As  the  binomial  distribution 
has  non-zero  skewness,  symmetrical  presentation  ( 1 .57)  is  not  applicable: 


/ 

LB(NU)  =  Q 

V 


1 zl) 

2  ) 


( 

UB(NU')  =  Q 

\ 


(1  66) 


The  boundaries  of  the  eonfidenee  interval  for  the  rate  of  events  ,  without  the 
assumption  of  normality,  arc  defined  as: 

LB(NL J')  LBjmX  >;  _  UB(NU')  _  UB(NU') 

“  NRnAt  ~  NJk  ’  "p  ~  N Rn At  ~  N  JK 

Here  X)im  and  a*  are  the  lower  and  upper  boundaries  of  the  confidence  interv  al, 
respectfully. 


1.2.3.  Numerical  Example 

Consider  the  process  of  simulated  wave  elevations  (from  a  linear  model) 
characterized  by  a  Bretshneider  speetrum  calculated  for  significant  wave  height  Hs~  1 1.5 
m  and  modal  period  of  T,„  =16.4  s.  Then  the  mean  period  is: 

7j  =0.7737’m  (168) 

The  follow  ing  fonnulation  of  spectral  density  was  used  (see  also  Figure  1.1): 

s(ro)  =  Aco  '  exp(-  Bco  4 );  A  =  \  73//;  T,4  ;  //  =  69  IT,  4  (1 .69) 


Figure  1.1  Bretschncider  Spectral  Density  Significant  Wave  Height  fis=\\.S  ni  and  Modal  Period  Tm 

=16.4  s 


15 


The  wave-elevation  time  history  for  a  fixed  point  of  space  was  computed  using  an 
inverse  Fourier  transform  as  follows: 


C »’  (0  =  2  'V,  cos(oy  +  cp, )  (1 .70) 

<=i 

Here  co,  is  a  set  of  frequencies  used  for  discretization  of  the  spectral  density  (1.68),  /•»>  is 
amplitude  of  the  /  component  and  cp,  is  a  phase  shift  for  the  /,h  component. 

The  frequency  set  for  this  sample  consists  of  uniformly  spaced  frequencies.  The 
number  of  frequencies  has  been  chosen  to  avoid  the  self-repeating  effect  described  in 
(Belenky,  et  al.,  2007).  The  total  number  of  frequencies  was  1 80,  with  42  components 
before  the  peak  of  the  spectral  density  curve  and  138  after  the  peak.  The  width  of 
frequency  band  was  expressed  through  the  frequency  of  the  peak  of  the  spectral  density 
curve. 

0  jr 

Bnd((o)  =  1.1 5co„.max  =  0.44  s  1  ;  C0H.max  =  j-  (1.71) 

1  m 

Then  the  frequency  step  is  calculated  as  follows: 


Aw  = 


Bnd(  co) 
~N 

to 


0.0032 


(1.72) 


The  lower  and  upper  limits  frequencies  are  to.  =  0.25  .v  1  and  ©  =  0.824  ,v 

respectively.  As  the  frequency  spacing  is  uniform,  the  duration  of  a  single  record  should 
not  exceed: 

TSnyM=^  =  1968  ,v  =  32.8  min  (1.73) 

Ato 

Taking  into  account  (1.73),  the  time  duration  for  a  single  record  was  set  to  7^30  min. 
To  ensure  that  this  spectral  discretization  does  not  lead  to  the  self-repeating  effect,  the 
autocorrelation  function  needs  to  be  checked: 


x  ,VW 

R(x)  =  |.,>(a))coscoTt/co ;  R, .  =  ^5’;cos(coJA/  •  (/  - 1))  j  =  \,2..Nl  (174) 

o 

Here  the  time  step  A t  -  0.5  s;  S,  is  the  value  of  the  power  speetrum  at  frequency  co,  and  N, 
is  the  number  of  time  steps. 

o)f  +0.5  Ato-  _  | 

Sj  =  |.v(co)c/co  =  0.5Aco(.s'(co,  -  0.5Aco)  +  s( co,  +  0.5Aco))  ( 1 .75) 

(0,  0.5A(t), 

The  autocorrelation  function  is  shown  in  Figure  1.2.  It  can  be  seen  from  this  figure  there 
is  no  self-repeating  effect. 

The  variance  of  wave  elevations  calculated  from  the  discretized  spectrum  should 
also  be  cheeked.  As  the  frequency  band  is  limited,  some  energy  at  high  frequencies  was 
not  included  in  the  discretized  model. 
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(1.76) 


.V, 


=  ^  =  2.79 


»K  H  sd  =  4  Oj  =  11.17m 


Here  V  y  is  the  varianee  of  wave  elevations  as  discretized,  a  ut  is  the  standard  deviation 
and  Hsj  is  the  significant  wave  height  as  discretized.  Because  of  the  frequency  truncation 
limits,  the  significant  wave  height  as  discretized  is  slightly  less  than  the  input  value  of 
1 1 .5  m.  The  difference,  however,  is  less  than  3%  and  can  be  considered  acceptable. 


Ri/Ri 


0.5 


200  400  600  800  1000  1200  1400  1600  1800 
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Figure  1.2  Autocorrelation  Function  of  Wave  Elevations  Calculated  from  Discreti/ed  Spectrum 

The  amplitudes  of  the  wave  components  in  (1.70)  are  calculated  from  discretized 
power  spectrum  as: 


(1.77) 


Finally  the  phase  shift  <p,  is  assumed  to  be  a  random  variable  uniformly  distributed 
from  0  to  271.  Each  set  of  yV(1)=180  phase  angles  corresponds  to  one  record  of  30  minutes 
The  dataset  used  in  this  example  includes  200  such  records. 

As  shown  in  Figure  1.3  for  the  level  of  a=7.5  m,  three  crossings  were  found  in  the 
first  record 


(I  78) 


A  total  of  721  up-crossing  events  were  observed  for  all  the  records: 


( 1  79) 


M  j\ 

The  total  number  of  cases  of  the  auxiliary  variable  U  is: 
N  =  Nr  n  =  200-3599  =  719800 


(1.80) 
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Figure  1.3.  Record  1  of  the  Process  of  Wa\e  Flevations;  With  Three  Fpcrossings  T  hrough  a  Level  of 

7.5  m 

Using  formula  (1.45)  the  parameter  of  the  Binomial  Distribution,  p,  and  the  uperossing 
rate,  X,  are  estimated  as: 


(1.81) 


(1.82) 


The  theoretical  value  for  the  rate  of  events  can  be  found  by  substitution  of  the  Normal 
Distribution  (Equation  1 .55)  into  Formula  (1.19) 


(1  -83) 


Formula  (1.83)  ineludes  the  varianee  of  the  temporal  derivative  of  wave  elevations  as 
diseretized: 


V 


(1.84) 


The  theoretical  value  and  estimate  for  uperossing  rate  seem  to  be  quite  close,  but 
it  cannot  really  be  judged.  The  estimate  of  uperossing  rate  X*  is  a  random  number. 
Therefore  a  eonfidenee  interval  needs  to  be  calculated  in  order  to  judge  the  eloseness  of 
the  estimate  and  theoretical  value. 

Two  methods  of  calculation  of  the  confidence  interval  for  the  uperossing  rate 
were  considered  in  the  section  above.  The  first  one  considers  the  estimate  of  the  average 
number  of  crossings  per  record  (see  formula  1 .45): 


(1.85) 


This  figure  is  assumed  to  follow  normal  distribution  with  the  mean  value  equal  to  itself: 
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/«(//!(’)=  m[  =3.605 


(1  86) 


The  variance  of  this  estimate  is: 


N„  N, 


-p’)=  0.018007 


(I  87) 


For  a  eonfidenee  level  of  (3=0.95,  the  half  width  of  the  eonfidenee  interv  al  for  ni  is: 

e  =  1 .959964-  yjy {m't  )  =  0.263007  (188) 

This  yields  the  following  values  for  lower  and  upper  boundary  for  the  rate  of  uperossings 
(see  equation  1 .57): 

C  =  ~~  -  =  0.00 1 857  ;  ^  =^-11  =  0.002149  (1  89) 

‘  H  ‘  K 


The  seeond  method  is  to  ealeulate  the  confidence  interval  for  X  directly  from  the  random 
variable  NU ,  using  its  binomial  distribution  with  parameter  p*  estimated  with  formula 
( 1 .79)  with  the  boundaries  defined  by  formulae  ( 1 .65) 

NU’  =  721 


f 

LB(NU)  =  Q 

v 


1  "Pi 


=  669 


UB(NU' )  =  Q 

\ 


1  +  p) 


=  774 


( 1  90) 


The  boundaries  of  the  eonfidenee  interval  for  X  are  given  by  formula  (1.66): 

=mmr ..g).  o.oo2|5 


NJ. 


N  T 

i  v  fi1  r 


(1  91) 


Both  methods  gave  almost  identical  results  for  the  confidence  interval.  The  theoretical 
value  of  the  uperossing  rate  is  contained  within  the  confidence  interval,  as  shown  in 
Figure  1.4. 
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Figure  1.4  Confidence  Intervals  for  the  Uperossing  Rate  Calculated  with  Normal  or  Binomial 

Distribution  for  a  Le%el  of  7.5  m 
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The  reason  the  results  of  two  different  methods  are  so  close  is  the  relatively  large 
number  of  crossings.  With  a  large  sample,  the  binomial  distribution  of  the  total  number 
of  crossings  can  be  very  well  approximated  with  the  normal  distribution  with  the 
following  mean  value  and  variance: 

m‘sv  =  NU'  =721;  V'm,  =  Np\  1  -p*)  =  720.28  ( 1 .92) 

The  binomial  distribution  of  total  number  of  crossings  with  parameter  p*  and  normal 
distribution  with  mean  value  and  variance  (1.90)  are  shown  in  Figure  1.5. 


Figure  1.5  Normal  and  Binomial  Distribution  of  Number  of  Crossing  for  the  Le\el  7.5  m 

The  difference  between  the  two  distributions  can  be  slightly  more  if  the  level  for 
crossing  is  raised  and  number  of  crossings  is  less,  but  even  in  the  ease  of  level  1  1  m  with 
only  10  crossings,  the  difference  in  confidence  intervals  calculated  with  the  two  different 
methods  is  not  significant,  see  Figure  1.5  and  Figure  1.6. 


Figure  1.6.  Normal  and  Binomial  Distribution  of  Number  of  Crossings  for  the  Lesel  11  m 
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Figure  1.7.  Confidence  Intervals  for  the  lipcrossing  Rate  Calculated  with  Normal  and  Binomial 

Distributions  for  a  Level  of  1 1.0  m 


1.2.4.  Mean  Time  Before  and  Between  Events 

An  estimate  for  the  rate  of  events  can  also  be  evaluated  from  statistics  of  time 
before  or  between  events  using  formula  (1.35).  This  provides  an  important  link  to 
reliability  engineering,  where  time  to  failure  is  one  of  the  principal  metrics. 

In  general,  stability  failure  can  be  considered  in  terms  of  conventional  reliability 
theory  (Sevastianov,  1963,  1994).  Most  of  engineering  practice  in  reliability  works  with 
statistics  of  time-to- failure,  where  the  Exponential  Distribution  is  only  one  of  many 
models  used  (see,  for  example  Meekar  and  Escobar  1998).  One  of  the  authors  applied 
this  approach  for  stability  failures  using  statistical  data  resulting  from  numerical 
simulation  (Ayyub,  et  a /  2006).  These  data  included  time  before  failure,  so  a  classic 
reliability  approach  w  as  used.  The  main  advantage  of  such  an  approach  is  that  it  does  not 
require  a-priori  knowledge  of  the  distribution  of  time  before  or  between  the  failures. 
While  an  exponential  distribution  was  used,  it  is  not  required  and  any  other  appropriate 
model  could  be  applied. 

The  assumption  that  time  before  or  between  failures  follows  an  exponential 
distribution  allows  significant  simplification  of  the  problem  as  only  one  parameter  needs 
to  be  estimated.  As  shown  above,  the  exponential  distribution  is  derived  formally  from 
uperossing  theory  assuming  the  independence  of  uperossings  (this  assumption  is 
considered  in  details  in  the  next  section).  The  exponential  distribution  connects  the  mean 
number  of  uperossings  and  parameters  of  the  distribution  of  time  before  or  between  the 
failures. 

The  assumption  of  an  exponential  distribution  does  not  contradict  to  experimental 
results  obtained  by  Ananiev  and  Savchuek  (1982).  A  description  of  these  experiments  (in 
English)  is  available  from  Belenky  and  Sevastianov  (2007). 

Once  the  exponential  distribution  is  accepted  a-priori,  it  can  be  demonstrated  that 
counting  events  provides  a  more  efficient  way  to  estimate  the  distribution  parameters.  To 
carry  out  such  a  demonstration,  the  wave  dataset  described  above  was  used.  In  addition 
to  counting  uperossings,  a  sample  of  time  between  crossings  (including  the  interval 
between  the  start  and  the  first  crossing)  and  a  sample  of  intervals  before  the  first  crossing 
have  been  populated: 
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(1.93) 


Here  Ter  is  the  set  of  time  intervals  between  uperossings,  index  i  relates  to  i 11 
crossing  occurring  at  /h  record,  k  is  a  counter  for  uperossing  for  the  entire  dataset  and  Ntr 
is  total  number  of  uperossings  in  the  dataset: 

Tf,=hj  (1.94) 


Here  Tj  is  time  before  the  first  uperossing  and  the  index  j  refers  to j  record. 

The  estimate  of  the  mean  value  for  the  time  between  and  before  the  crossings  is: 


"her  = 


N 


Ira 


(1.95) 


*=l 


The  estimate  nC  is  a  random  number,  as  it  is  a  finite  sum  of  random  variables 

with  an  exponential  distribution.  The  distribution  of  a  sum  tends  to  normal  per  the 
Central  Limit  Theorem  (Strictly  speaking,  it  is  a  truncated  normal  distribution  as  the 
estimate  of  mean  time  cannot  be  negative). 


As  the  mean  value  is  an  unbiased  estimate,  the  mean  value  of  its  distribution 
equals  to  the  estimate  itself  (1 .93).  while  the  variance  is  expressed  as: 


The  variance  of  time  between  uperossings  V"  can  be  estimated  as: 


(1.96) 


V  -  - 

’  Ter 


1 


■Zj?Crk  -"her)' 


A'.-itr 

Then,  the  estimate  of  the  mean  time  betw  een  crossings  is  expressed  as: 


(1.97) 


"here  =  "her  +  Kp  ^  ("her)  " herL  =  "her  ~  K \\  ^V("hn)  (  1  '98  ) 

Here  m]at  and  m’T  are  upper  and  lower  confidence  interval  boundaries  of  the  estimate. 
The  estimate  of  the  uperossing  rate  can  then  be  expressed  as: 

}  't  =  ("her  )  ’  ;  }  ’w  =  ("her,  )  ‘  i  ^TL  =  ("herV  )  ’  {  l  •") 

Where  XT  is  the  uperossing  rate  estimate  based  on  time  between  uperossings  while  }’p 
and  are  the  upper  and  lower  boundaries  of  the  confidence  interval,  respectively. 

Another  estimate  can  be  evaluated  using  only  time  interval  before  the  first 
uperossing: 
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(1.100) 


mt<r  = 


Nn,  t 


Hm 


Here  Ni-c,  is  number  of  records  with  at  least  one  uperossing.  The  boundaries  of  the 
confidence  interval  are  calculated  in  a  manner  similar  to  formulae  ( 1 .94-1 .97) 


(1.101) 

="F„  +  K^V(mtF„)  tnFcrl  =  (1.102) 

t-  T  =  ("V,r  )  '  l  }  =  (»hrr /  )  '  •  ^ H  =  ("h.  rl  )  '  (  1  .  1  03 ) 


V  ..=■ 


F*r 


i=l 


Here  V’h.r  is  the  estimate  of  the  variance  of  the  time  before  the  first  uperossing.  I  (m  ) 

is  the  estimate  of  the  variance  of  the  mean  of  time  before  the  first  uperossing.  m]  and 

m]  are  the  upper  and  lower  boundaries  of  the  confidence  interval  for  the  mean  time 

before  the  first  crossing.  \'rr,)SH  ,  and  X'FL  arc  the  estimates  of  the  uperossing  rate 

based  on  the  time  before  the  first  crossing  and  its  upper  and  lower  boundaries, 
respectively. 

Figure  1.8  compares  the  theoretical  rate  of  uperossing  for  a  level  of  5  m  w  ith 
estimates  carried  out  with  several  methods.  The  data  set  consists  of  5407  events  based  on 
the  previously  described  numerical  example.  As  is  can  be  seen  from  this  figure,  all  the 
estimates  contain  the  theoretical  value  in  their  confidence  intervals,  therefore  all  the 
methods  were  able  to  yield  eorreet  result  in  this  ease.  It  is  also  clear  from  the  Figure  1.8 
that  the  methods  based  on  counting  of  events  and  time  between  events  provide  better 
estimates  then  the  method  based  on  the  time  to  the  first  uperossing,  as  the  latter  one 
utilizes  a  smaller  sample. 

Figure  1.9  compares  theoretical  rate  of  uperossing  for  a  level  of  9  in  with 
estimates  carried  out  with  several  methods.  The  data  set  consists  of  153  uperossings 
totally  and  1 1 1  first  uperossings.  The  estimates  obtained  with  time  between  or  before  the 
erossing(s)  do  not  inelude  the  theoretical  value  in  their  confidence  intervals,  while  the 
method  based  on  counting  events  still  yields  a  eorreet  estimate. 

The  reason  why  time-based  estimates  are  biased  is  that  the  sample  is  limited  in 
time.  It  eannot  inelude  any  data  beyond  the  length  of  a  record.  Therefore  the  mean  time 
before  the  crossing  is  biased  towards  smaller  values  and  rate  of  events  overestimates  the 
true  value.  The  standard  way  of  correcting  such  a  bias  is  a  procedure  of  censoring  (see. 
for  example  Meekar  and  Escobar,  1998). 
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Figure  1.8.  Comparison  of  Different  Methods  to  Kstimate  Upcrossing  Rate  for  the  Numerical 
Example  for  a  Level  of  5  m  (Total  Number  of  Lpcrossings  5407). 


Theoretical 

value 

0 - 


1  ime 

between 

events 


O 


O 

Count  of 
events 


Time  to  1st 
event 


0.0016 

0.0014 

0.0012 

0  001 
8 -Ilf4 
6  •  1 0~4 
4  - 1 0-4 
2  - 1 0~ 4 

Figure  1.9.  Comparison  of  Different  Methods  to  Estimate  Lpcrossing  Rate  for  the  Numerical 
Example  for  a  Level  9  m  (Total  Number  of  Upcrossings  153). 
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Subsequently  a  censoring  procedure  (Ayyub,  et  al,  2006)  was  applied  to  estimate 
the  rate  of  events  front  time  intervals  before  the  first  upcrossing.  The  sample  is  made  up 
in  the  following  way: 


TP, 


\Tf,  if  Nb,  >  0 

U  f  NVj=  0 


(1.104) 


The  censored  mean  time  before  the  first  crossing  is  expressed  as 
1 

m'rc=-^-lJfck  (1105) 

A  h  er  *= 1 

Here  Nr  is  a  number  of  records  while  N/-,-r  is  the  number  of  records  with  at  least  one 
upcrossing. 

Direct  evaluation  of  the  confidence  interval  of  the  censored  mean  may  present 
certain  difficulties.  If  the  number  of  observed  crossings  is  small,  the  variance  estimate  of 
the  censored  data  also  tends  to  be  small  as  most  of  the  data  points  equals  to  the  duration 
of  simulation.  Small  variance  estimate  leads  to  a  narrow  confidence  interval;  this  creates 
a  paradox  as  decreasing  of  the  sample  si/e  is  expected  to  increase  statistical  uncertainty. 
This  paradox  is  caused  by  the  fact  that  censored  data  are  expected  to  estimate  bounds,  not 
the  actual  value;  see,  for  example,  (Meekar  and  Escobar  1998).  Generally  censoring  the 
data  does  not  increase  or  decrease  the  statistical  uncertainty.  Therefore,  the  width  of 
confidence  interval  for  the  uncensored  mean  estimate  can  be  used  with  censored  mean 
estimate; 


Here  X*m;  ,  Kra_ 


^  I  Ct  ^  /•*(  A  I'J  *  A y  A-  frj  A  y  j  “t"  A  ^  ( 1.106) 

are  the  estimated  event  rates  based  on  the  censored  time  before  the 


first  crossing  and  its  upper  and  lower  boundaries,  respectively. 

Figure  1.10  compares  theoretical  rate  of  events  with  statistical  values  estimated 
with  two  different  methods;  the  level  of  uperossings  was  equal  to  9  m  and  the  sample  si/e 
was  111.  The  rate  of  events  estimated  with  time  between  events  is  not  considered  here,  as 
it  is  not  clear  how  to  censor  this  type  of  data. 


As  it  can  be  seen  from  Figure  1.10,  the  censoring  removed  the  bias  observed  in 
Figure  1.9;  so  both  methods  have  shown  a  correct  estimate  for  the  example  considered. 
The  confidence  interval  for  the  second  method  (based  on  censored  time  to  the  first  event) 
is  slightly  wider,  as  a  sample  size  for  the  time  before  the  first  upcrossing  is  less  than  the 
sample  size  for  counting  crossings  (111  vs.  153;  34  records  had  more  than  one 
upcrossing). 
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Figure  1.10.  Comparison  of  Different  Methods  to  Estimate  Upcrossing  Rate  for  the  Numerical 
Example  for  a  Level  of  9  m  (Total  Number  of  Upcrossings  153,  Number  of  First  Upcrossings  1 11). 


1.3. Distributions  Related  with  Upcrossing  Events 

1.3. 1.  Limitations  of  Poisson  Distribution 

As  it  was  shown  in  the  previous  section,  the  derivation  of  the  Poisson  distribution 
leads  to  a  very  important  practical  result:  the  distribution  of  time  before  or  between 
crossings  is  exponential.  To  characterize  the  exponential  distribution,  only  one  parameter 
has  to  be  estimated.  This  parameter  is  the  average  number  of  uperossings  per  unit  of  time 
(mean  uperossing  rate).  It  is  equivalent  to  the  inverse  of  the  average  time 
before/between  crossings.  Onec  this  parameter  is  known,  the  probability  of  at  least  one 
crossing  can  be  trivially  evaluated  for  any  given  time  of  exposure. 

The  derivation  of  the  Poisson  and  exponential  distributions  were,  however,  not 
free  of  assumptions.  Application  of  the  binomial  distribution  assumes  repeating 
independent  Bernoulli  trials  on  each  time  step.  A  Bernoulli  trial  is  a  random  event  that 
can  produce  only  two  outcomes,  usually  called  “sueeess”  or  “failure”.  These  outcomes 
are  related  here  to  either  the  occurrence  or  non-occurrence  of  an  upcrossing  at  this  instant 
of  time. 

The  independence  of  Bernoulli  trials,  however,  is  not  always  guaranteed  for  the 
upcrossing  of  a  general  stochastic  proeess  (i.e.  it  is  not  a  white  noise  and  its 
autocorrelation  function  is  generally  not  a  delta-function).  Stochastic  processes,  such  as 
wave  elevation,  wave  slope,  or  roll  angle,  posseses  a  eertain  amount  of  inertia.  The 
instantaneous  value  of  the  proeess  cannot  change  abruptly.  Therefore  the  values  in 
neighboring  time  steps  are  dependent,  provided  the  time  step  is  reasonably  small.  This 
dependence  has  a  finite  duration  and  the  time  it  takes  the  autocorrelation  function  to  drop 


26 


below  a  given  level  is  often  used  as  a  measure  of  this  dependence.  As  it  can  be  seen  from 
Figure  1.11  (a  zoomed  in  version  of  Figure  1.2),  it  takes  40-45  seeonds  tor  this 
autocorrelation  to  die  out  to  a  level  below  0.05.  While  this  criterion  remains  somewhat 
arbitrary,  it  can  still  be  used  in  the  first  expansion.  For  a  Guassian  (normal)  process,  the 
autocorrelation  function  captures  all  of  the  information  about  dependence.  For  non- 
Guassian  processes,  the  absence  of  correlation  does  not  guarantee  independence. 


Figure  1.1 1.  Sample  Autocorrelation  Function  (Zoomed  in  From  higure  1.2) 

Therefore,  if  time  between  neighboring  uperossing  events  exceeds  the  time  for 
autocorrelation  function  to  die  out,  one  can  assume  these  events  are  independent.  The 
time  between  the  events,  however,  is  a  random  number;  therefore,  the  judgment  on 
independence  only  can  be  made  in  probabilistic  sense. 

Since  the  exponential  distribution  of  the  time  before  and  between  the  events  is  the 
most  important  consequence  of  the  Poisson  distribution,  it  makes  sense  to  evaluate  if 
such  hypothesis  contradicts  observed  data.  Standard  statistical  procedures  used  for  these 
purposes  can  be  employed  as  part  of  the  procedure  to  test  if  impendence  of  uperossing 
can  be  assumed  and  Poisson  distribution  is  applicable. 

1.3.2.  Distribution  of  Time  before  Event 

Following  the  previous  works  of  Ayyub,  et  at  (2006)  and  Belenky,  et  cil  (2007),  a 
sample  of  time  before  the  first  uperossing  is  analyzed.  This  analysis  is  earned  out  for 
several  crossing  levels  to  examine  the  influence  of  the  dependence  between  crossings  and 
sample  size  on  the  statistical  estimates  of  the  uperossing  rate. 

For  the  given  example  with  a  Gaussian  distribution,  the  theoretical  mean  time 
before  of  between  events  may  be  calculated  by  simply  inverting  formula  ( 1 .20)  or  ( 1 .83): 


(1.107) 


Formula  (1.107)  is  an  exponential  function.  The  dependence  is  depicted  in  Figure 
1.12  for  the  wave  example  considered  in  this  section.  It  can  be  seen  from  this  figure  that 
the  time  for  the  autocorrelation  function  to  die  out  corresponds  to  a  level  of  4. 2-4. 4  m. 


27 


Figure  1.12.  Theoretical  mean  time  before  ore  between  upcrossing  events  shown  as  a  function  of 

level  of  upcrossing  a 

It  is  well  known  the  mean  value  and  standard  deviation  of  the  exponential 
distribution  are  identieal.  This  condition  was  analyzed  by  Belenky  et  a!  (2007)  as  a 
possible  indicator  of  the  applicability  of  the  Poisson  distribution  and  it  was  shown  to  not 
be  the  best  way.  At  the  same  time,  a  lower  boundary  of  their  confidence  interval 
compared  with  the  autocorrelation  decay  time  may  produce  relevant  information. 

The  estimate  of  the  mean  value  and  variance  of  the  time  before  the  first  event  is 
given  by  formula  (1.08)  and  (1.99),  while  the  confidence  interval  for  mean  value  estimate 
can  be  calculated  with  formulae  (1.100).  In  order  to  calculate  confidence  interval  for  the 
variance  estimate,  the  value  of  the  fourth  central  moment  M*  is  necessary: 


V{V'}  =  —  M, - -  -  ( V *)2 

N  N(N- 1) 


(1.108) 


Here  N  is  number  of  points  and  V'  is  estimate  of  variance.  Estimating  the  fourth  central 

moment  directly  form  a  statistical  sample  is  known  to  be  difficult,  due  to  sensitivity  of 
the  numerical  values  to  outliers.  Therefore  expressing  a  fourth  moment  through  the 
variance  using  a  certain  assumption  of  the  character  of  distribution  has  been  a  standard 
technique.  As  the  distribution  of  the  time  before  the  first  event  is  expected  to  be 
exponential,  Belenky,  et  al  (2007)  used  the  relation  derived  from  the  exponential 
distribution: 

A/4  =9-V2  (1.109) 

This  leads  to  the  following  expression  for  the  variance  of  the  variance  estimate: 
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I 

i 


(II 10) 


y  .  *N-6  y.  2 

JV(JV-l) 


Once  the  mean  value  and  variance  of  the  variance  estimate  are  found,  a  normal 
distribution  is  used  to  find  boundaries  of  the  confidence  interval.  This  approach  is  quite 
common,  however,  caution  should  be  exercised,  as  the  normal  distribution  is  defined 
from  -oo  to  +oo,  while  variance  by  the  definition  is  a  non-negative  value.  Therefore,  an 
additional  check  is  needed  or  the  distribution  must  be  truncated  at  zero. 


K.  *  =  K„  +  k,  V»'<C)  K. =  K„  -  k,  V»'<0 

The  standard  deviation  is  calculated: 

^  h  r  ~  =  yf^hert  •  ®  lirl  =  yj^hcrl 


(l.l  II) 


(1.112) 


The  results  are  shown  in  Figure  1.13  for  the  level  of  crossing  of  5  m.  As  it  can  be 
seen,  the  estimates  of  the  mean  value  and  standard  deviation  are  statistically  identical. 
Also  the  lower  boundary’  of  the  mean  value  is  above  50  seconds,  i.e.,  more  than  the 
interval  of  time  for  the  autocorrelation  function  to  die  out.  Both  of  these  observations  are 
sy  mptoms  of  the  exponential  distribution. 
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Figure  1.13.  Estimates  of  mean  value  and  standard  deviation  for  time  before  the  first  event.  Level  of 

crossing  5m,  200  crossings  total. 

Figure  1.14  shows  a  histogram,  the  theoretical  distribution,  and  three  distributions 
based  on  different  statistical  parameters.  For  the  level  at  5m.  all  200  records  had  at  least 
one  uperossing,  so  the  size  of  the  sample  equals  200.  The  bin  width  for  the  histogram  was 
calculated  as  (Scott,  1979): 


W  = 


(1.113) 


The  histogram  in  Figure  1.14  is  presented  in  terms  of  PDF: 


*  =  "■ 


W  N 


(1.114) 


Here  Hj  is  the  number  of  cases  that  fits  in  the  /1’  bin  and  N  is  total  number  of  cases. 


A  Pearson  chi-square  goodness-of-fit  test  was  performed  for  each  of  the  distributions. 
The  value  of  number  of  degrees  of  freedom  t/,  and  -  probability  that  the 

difference  between  the  fit  and  the  histogram  is  caused  by  random  reasons  arc  also  placed 
into  Figure  1.14.  The  test  shows  that  the  fit  is  good  for  all  four  curves,  as  the  probability 
is  well  above  the  accepted  significance  value  of  0.05. 
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-  Theoretical  Distribution  Density;  x2=l  1.02  d-1 1  P(x  ,J)-0.44 

-  Based  on  Average  Number  per  Unit  of  Time;  x2=l  1 .01  d=10  />(x2,c/)^0.36 

Based  on  Average  Time  before  1st  Crossing  x2=  12.85  d=  1 0  P(x2,cf)=023 
Based  on  Average  Censored  Time  before  1st  Crossing  x2=  12.85  d=10  P(x  sd)  023 

F  igure  1.14.  Distribution  of  time  intervals  before  the  first  crossing.  Level  of  crossing  5m,  200 

crossings  total. 

Another  example  considered  above  was  for  9  m  level  of  crossing.  The  importance 
of  this  example  is  that  it  contains  89  records  without  any  upcrossings,  so  the  effect  of  the 
censoring  technique  can  be  demonstrated.  Figure  1.15  shows  a  histogram  (in  terms  of 
PDF)  along  with  distribution  curves  using  different  parameters  as  well  as  results  of 
Pearson  chi-square  goodness-of-fit  test.  The  insert  shows  estimates  for  mean  value  and 
standard  deviation  with  the  appropriate  confidence  interval. 


-  Theoretical  Distribution  Density;  x2-48.7  d-4  P{yj,d)= 6.83  10~I(I 

-  Based  on  Average  Number  per  Unit  of  Time;  x2-47.6  d=3  P(y}.<J)=  2.62  10  10 

-  Based  on  Average  Time  before  1st  Crossing  yj=25.1  d-3  P{yj^f)~ 0.00001 1 

-  Based  on  Average  Censored  Time  before  1st  Crossing  x2=43.06  d=3  P(x \d)=  2.38  10  9 

Figure  1.15.  Histogram  of  Time  Before  the  First  Crossing  for  a  1  evel  of  9m  (1 1 1  Crossings  Total) 
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The  estimates  of  the  mean  value  and  standard  deviation  are  statistically  different; 
their  confidence  intervals  do  not  have  any  overlap.  None  of  the  curves  fit  the  histogram. 
Obviously,  the  hypothesis  of  an  exponential  distribution  is  not  supported  by  the  observed 
data.  At  the  same  time,  it  is  known  from  the  uperossing  theory  (and  was  shown  in  the 
derivations  above)  that  time  intervals  before  the  first  crossing  must  follow  an  exponential 
distribution  if  the  number  of  uperossings  follows  a  Poisson  distribution. 

The  most  important  condition  to  satisfy  is  the  independence  of  uperossings.  For 
the  considered  example  with  a  Gaussian  process,  independence  of  uperossings  is 
achieved  when  the  time  between  the  neighboring  uperossings  exceeds  the  time  for  the 
autocorrelation  function  to  die  out.  Increasing  the  level  of  crossing  increases  the  mean 
value  of  the  time  before  uperossing  from  about  60  seconds  for  a  level  of  5  m  to  about  800 
seconds  for  a  level  of  9  m.  At  the  same  time,  the  hypothesis  of  an  exponential 
distribution  was  not  rejected  for  a  level  of  5  m,  but  was  rejected  for  a  level  of  9  m.  The 
reason  for  this  rejection  is  likely  to  be  unrelated  to  the  applicability  of  the  Poisson 
distribution  to  the  number  of  crossings. 

In  a  sense,  a  similar  situation  was  observed  in  Figure  1.9,  where  the  results  for 
uperossing  rates  calculated  by  counting  events  or  averaging  the  time  before  the  first  event 
were  found  to  be  drastically  different.  The  reason  for  the  difference  was  the  limited 
simulation  duration,  which  can  bias  the  average  time,  but  not  the  number  of  events.  This 
discrepancy  was  resolved  by  censoring  the  data  of  time  intervals  before  the  first 
uperossing,  where  it  was  assumed  that  more  uperossings  could  be  encountered  if  the 
duration  of  records  would  be  longer. 

The  same  assumption  can  be  made  for  a  histogram,  as  well,  by  adjusting  the  total 
number  of  cases: 


fh 

W  •  N„ 


(1.115) 


Here  Nr  =200  is  the  total  number  of  records.  Obviously,  the  normalization  condition  for 
this  expression  is  no  longer  met,  as  the  rest  of  the  data  is  assumed  to  be  beyond  the  length 
of  a  record. 


The  “censored”  histogram  is  shown  in  Figure  1.16  along  with  the  same  set  of 
distribution  curves.  The  Pearson  ehi-square  goodness-of-fit  test  has  shown  that  only  the 
curve  based  on  uncensored  average  time  before  the  first  uperossing  docs  not  fit  the  data. 
All  other  curves  show  robust  agreement  with  the  data. 

To  complete  the  examination  of  the  time  before  the  first  uperossing,  we  now 
examine  cases  where  the  conditions  for  a  Poissonian  process  are  violated.  The  crossing 
level  has  been  set  to  3  m  where  15201  uperossings  were  observed.  As  it  can  be  seen 
from  Figure  1.17.  most  of  uperossings  are  clustered  and  there  arc  many  cases  when 
neighboring  periods  have  uperossings.  At  the  same  time,  some  peaks  remain  below  the 
level  and  some  of  the  time  between  crossings  may  be  longer  than  just  a  period. 
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-  Theoretical  Distribution  Density;  y:-48.7  d^4  P(X2,d)=Q.$3 

-  Based  on  Average  Number  per  Unit  of  Time;  y2=47.6  d=3  0.77 

-  Based  on  Average  Time  before  1st  Crossing  y2=25.7  d=3  />(x.2-t/)=0.000088 

-  Based  on  Average  Censored  Time  before  1st  Crossing  x:-43.06  d-3  P(x~.d)~  0.79 

Figure  1.16.  Censored  Histogram  of  Time  Before  the  First  Crossing  For  a  I  e\el  of  9  m  (1 1 1 

Crossings  Total) 


Figure  1.17.  Record  1  of  the  Process  of  Wave  Fdevations;  Upcrossing  Fes  el  3  m 


Figure  1.18  shows  a  histogram  of  the  time  before  the  first  upcrossing.  The 
estimates  for  the  mean  value  and  variance  (with  confidence  interval)  are  shown  in  the 
inset  of  the  figure.  Despite  the  fact  that  these  estimates  are  statistically  equal,  the  entire 
confidence  interval  of  the  mean  value  estimate  is  well  below  40  seconds-  the  time 
duration  needed  for  the  autocorrelation  function  to  die  out. 

A  Pearson  chi-square  goodness-of-fit  test  rejects  the  exponential  distribution 
based  on  the  theoretical  upcrossing  rate  as  well  as  based  on  the  upcrossing  rate  estimated 
from  counting  of  events.  At  the  same  time,  the  test  does  not  reject  the  exponential 
distribution  based  on  estimate  of  mean  time  before  the  first  upcrossing  (there  is  no 
difference  between  censored  and  uncensored  mean  values,  as  every  record  has  at  least 
one  upcrossing).  These  results  may  seem  confusing,  but  certain  conclusions  can  still  be 
drawn. 


32 


-  Theoretical  Dislribulion  Density;  x:^20.9  d  10  P(y\<J)  0  022 

-  Based  on  Average  Number  per  Unit  of  Time;  *//= 2 1 .4  P(y^uf)-  0.01 

-  Based  on  Average  Time  before  1st  Crossing  X/-I4.3  tl-9  P{y~%if)=Q  1  1 

-  Based  on  Average  Censored  Time  before  1st  Crossing  y2-14.3  d  9  P(y\d)  0.1  1 

Ligure  1.18.  Distribution  of  time  inters  als  before  the  first  crossing.  Level  of  crossing  3  m,  200 

crossings  total. 


The  equality  of  mean  value  and  standard  deviation  is  a  necessary,  but  not  a 
sufficient  condition  for  the  exponential  distribution.  If  the  mean  value  and  standard 
deviation  (including  confidence  interval  extents)  arc  smaller  than  the  time  required  for 
the  autocorrelation  function  to  die  out,  the  Poisson  distribution  is  likely  inapplicable  due 
to  data  dependence.  The  entire  confidence  interval  laying  in  the  domain,  where 
autocorrelation  still  is  not  insignificant  (see  Figure  1.11),  suggests  possible  dependence 
of  neighbor  uperossing  and  therefore  the  inapplicability  of  Poisson  distribution. 

There  is  a  difference  between  the  uperossing  rate  calculated  from  the  mean  \alue 
of  lime  before  the  uperossing  and  by  the  counting  of  events,  exhibited  in  the  different 
outcomes  of  the  goodness-of-fit  test.  Based  on  the  previous  paragraph,  the  difference 
between  the  two  could  he  explained  by  insufficient  duration  of  a  record  or  dependence  of 
neighboring  uperossing  resulting  in  violation  of  the  Poisson  distribution.  Since  e\ery 
record  has  at  least  one  uperossing.  there  is  no  difference  between  censored  and 
uneensored  data.  Therefore,  the  reason  for  the  discrepancy  is  the  violation  of  Poisson 
distribution. 

The  deri\  ation  of  the  theoretical  formula  for  uperossing  rate  (equations  1.10-1.19) 
relied  only  on  the  assumptions  of  continuity  and  stationarity,  but  not  on  the  assumption  of 
the  Poisson  distribution.  The  formula  ( 1 .20)  is  correct  for  any  level  of  crossing  and  the 
theoretical  curve  in  Figure  1.19  can  be  considered  as  a  true  answer.  This  confirms  the 
conclusion  made  above  on  the  non-Poissonian  character  of  the  distribution  of  the  number 
of  uperossings  for  the  level  of  3m. 

Further  lowering  the  crossing  level  down  to  1  m  in  order  to  observe  what 
changes,  if  any,  there  are,  we  see  an  uperossing  on  almost  every  period,  sec  Figure  1.19, 
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Figure  1.19.  Record  1  of  the  Process  of  Wave  Elevations;  Uperossing  Level  1  m 


Figure  1.20  shows  a  histogram  of  time  before  the  first  crossing  (in  terms  of  PDF) 
for  the  level  of  crossing  of  Im  along  with  estimates  of  mean  value  and  standard 
deviations.  In  contrast  with  the  previous  example,  with  3  m  level  of  crossing,  there  is  no 
indication  of  the  applicability  of  the  exponential  distribution  for  the  time  before  the  first 
crossing  or  the  Poisson  distribution  for  the  number  of  uperossings. 


-  Theoretical  Distribution  Density;  y"-49.3  d-9  P(x~<cf)~  1.47  10' 

-  Based  on  Average  Number  per  Unit  of  Time;  x2=49.4  d=8  P(x2.d)~  5.41  10 

Rased  on  Average  Time  before  1st  Crossing  x2=32.6  d-8  0.000073 

-  Based  on  Average  Censored  Time  before  1st  Crossing  x2-32.6  d=8  P(x2,c/)=0.000()7 3 

Figure  1.20.  Distribution  of  Time  Before  the  First  Crossing  for  a  Level  1  m  (200  Crossings  Total) 


The  histogram  shows  a  peak  around  5  seconds.  This  is  a  little  short  than  a  half  of 
the  mean  zero-crossing  period  of  11.6  seconds  and,  probably  represents  the  most 
probable  time  from  the  start  until  the  first  uperossing  is  encountered. 

Note  that  if  a  record  starts  above  the  given  level,  the  first  uperossing  will  only  be 
detected  when  the  process  comes  back  below  the  level  and  crosses  it  again.  This  may 
have  an  influence  on  the  shape  of  the  histogram. 


34 


1.3.3.  Distribution  of  the  Time  Between  Events 

The  derivation  of  the  formulae  for  the  time  between  and  before  events  ( 1 .30-1 .37) 
is  based  on  the  Poisson  distribution  for  the  number  of  uperossings  and  targets  a  time 
without  uperossings.  This  random  number  ean  be  interpreted  as  the  time  before  the  first 
uperossing  or  as  the  time  between  uperossings.  Therefore,  the  exponential  distribution 
must  be  equally  applieable  to  both  these  random  variables.  Consideration  of  the 
distribution  of  time  before  the  first  event,  as  in  the  section  above,  has  shown  that  the 
duration  of  a  record  is  another  major  factor.  In  order  to  demonstrate  the  exponential 
distribution  statistically  the  records  have  to  be  long  enough,  so  all  (or  a  statistically 
significant  number)  of  records  must  have  at  least  one  uperossing  in  addition  to  satisfying 
Poisson  distribution  conditions. 

The  objective  of  this  section  is  to  demonstrate,  with  numerical  examples,  how  the 
uperossing  level  affects  the  statistical  distribution  of  time  between  events.  Setting  the 
lower  level  gives  a  statistically  significant  number  of  uperossings  (or  records  with 
uperossings),  but  may  lead  to  a  violation  of  the  Poisson  distribution  condition,  if 
uperossings  occur  too  frequently  to  be  independent  random  events.  On  the  other  hand,  if 
the  level  is  too  high,  the  duration  of  a  record  may  be  insufficient. 

Figure  1.21  shows  a  histogram  (in  terms  of  PDF)  of  time  intervals  between 
uperossings  along  with  estimates  of  mean  value  and  standard  deviation.  Results  of 
Pearson  chi-square  goodness-of-fit  tests  are  also  shown  for  the  five  curves  as  specified  in 
the  figure.  The  confidence  intervals  of  the  mean  value  and  standard  deviation  have 
significant  overlap.  Also,  the  lower  boundary  of  the  confidence  interval  for  the  mean 
value  estimate  is  about  340  seconds,  enough  time  for  the  autocorrelation  function  to  die 
out,  so  applicability  of  the  exponential  distribution  is  not  excluded.  The  shape  ot  the 
histogram  also  suggests  exponential  distribution. 


Theorelical  Distribulion  Densily;  y2==32.0  d-13  0.0024 

-  Based  on  Average  Number  per  Unit  of  Time;  x2=37.5.4  d=12  P{x~.cf)=-  0.00018 

Based  on  Average  Time  before  lsl  Crossing  x‘=9.19  dH2  P(\.d)  0.69 

-  Based  on  Average  Censored  l  ime  before  1st  Crossing  x2=i8.6  d=12  P(x\(i)= 0.098 

Based  on  Average  Time  belween  Crossing  y2=8 -1 1  d=l  2  P(yj 

Figure  1.21.  Distribution  of  Time  Between  Uperossings  for  a  Level  of  7.5  m  (721  Crossings  Total,  196 

Records  with  at  Least  One  Crossing) 
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As  indicated  above,  five  different  curve  fits  were  tried.  All  of  these  curves  were 
exponential  distributions  but  the  parameter  was  calculated  differently..  The  Pearson  chi- 
square  goodness-of-fit  test  shows  that  three  of  the  methods  for  selecting  the  distribution 
parameter  yield  a  fitted  distribution  that  is  not  rejected  (the  probability  is  greater  and 
0.05,  the  generally  accepted  significance  level),  that  is  the  curves  match  the  data.  At  the 
same  time  two  curves  show  a  probability  less  than  0.05,  so  the  hypothesis  is  rejected; 
meaning  that  these  particular  curves  do  not  match  the  data. 

The  distribution  parameter  calculated  through  averaging  of  the  intervals  between 
uperossings  (1.95)  passes  the  test.  This  confirms  the  above  observation  on  exponential 
character  of  the  distribution,  in  general. 

However,  theoretical  distribution  does  not  match  the  data.  The  distribution 
parameter  was  calculated  by  formula  (1.83)  using  variance  of  wave  elevations  and  their 
derivatives  “as  discretized”.  This  discrepancy  is  caused  by  an  insufficient  length  of 
record  for  this  particular  level  of  crossing  (Belenky,  et  al ,  2007).  The  reason  is  that  the 
sample  is  limited  by  the  length  of  a  record  and  intervals  between  uperossing  longer  than 
duration  of  a  record  are  absent  from  the  sample.  These  intervals  are  statistically 
significant  for  the  level  of  7.5  m. 

The  rate  of  events  calculated  as  an  average  number  of  uperossings  per  unit  of  time 
(1.82)  was  found  to  be  quite  close  to  the  theoretical  value,  see  Figure  1.22.  Therefore, 
the  exponential  distribution  based  on  this  value  is  very  close  to  theoretical  one.  Then  for 
the  same  reason,  as  the  theoretical  one,  it  is  rejected  by  the  observed  data;  as  the  intervals 
between  uperossings  that  are  longer  than  the  duration  of  a  record  still  have  statistical 
significance  for  the  considered  level. 

0  003 

0.0025 


0.002 

0.0015 

Figure  1.22.  Comparison  of  Different  Methods  to  Estimate  the  Parameter  of  the  Exponential 
Distribution  (Uperossing  Rate)  for  a  Level  of  7.5  m  (721  Crossings  Total,  196  Records  With  at  Least 

One  Crossing) 

The  distribution  based  on  the  uncensored  average  time  before  the  first  uperossing, 
is  not  rejected  by  the  observed  data.  This  method  gives  similar  results  to  the  one  based 
on  average  time  between  the  uperossing,  sec  also  Figure  1.22.  It  is  suggested  by  both 
Figures  Figure  1.21  and  Figure  1.22  that  the  average  time  before  the  1st  uperossing  has  a 
bias  leading  to  overestimation  the  rate  of  events,  as  w  as  discussed  in  the  previous  section. 
This  is  notable,  that  the  bias  exists  for  the  level  of  7.5  m,  with  only  4  records  without 
uperossing,  however  it  was  enough  to  move  the  estimate  away  from  the  theoretical 
solution  and  provide  a  significant  probability  that  the  fit  to  biased  data  is  good. 

The  censoring  procedure  moves  the  estimate  back  to  the  theoretical  solution; 
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however,  the  relatively  wide  confidence  interval  still  allows  the  goodness-of-fit  test  not  to 
reject  the  distribution  based  on  censored  average  time  before  the  first  crossing. 

Figure  1.23  shows  a  histogram  (in  terms  of  PDF)  of  time  intervals  between 
uperossing  along  with  estimates  of  the  mean  value  and  standard  dev  iation  calculated  for 
a  level  of  9  m.  The  results  of  the  Pearson  ehi-square  goodness-of-fit  test  are  also  shown 
for  five  curves  as  specified  in  the  figure.  None  of  them  fit.  Confidence  intervals  for 
estimates  of  the  mean  value  and  standard  deviation  do  not  have  any  overlap.  Similar 
observations  were  made  for  distribution  of  time  intervals  before  the  first  crossing,  see 
Figure  1.15.  However,  the  “censored”  histogram  in  Figure  1.16  has  confirmed  that  this 
was  the  case  when  there  is  a  significant  deficiency  in  the  duration  of  the  records,  as  the 
censored  data  confirms  that  the  distribution  is,  in  fact,  exponential 


- Theoretical  Distribution  Density;  y:-80.2  d-4  P(x2.d)= 0.0 

-  Based  on  Average  Number  per  Unit  of  Time:  y:=78.5.4  d=4  P(%2.(f)=  0.0 

—  Based  on  Average  Time  before  1st  Crossing  ‘/-22.X  d-4  /’( '/.</) -0.000 1 33 

-  Based  on  Average  Censored  Time  before  1st  Crossing  x  7 1 .3  d  4  P(y',d)  1.2  10  14 

Based  on  Average  Time  between  Crossing  x‘-28.37  d-4  /V/T/l-O.OOOO  1 1 

Figure  1.23.  Distribution  of  Time  Between  IJpcrossings  For  a  Level  of  9  m  (153  Crossings  Total,  1 1 1 

Records  with  at  Least  One  Crossing) 


It  is  not  clear  how  censoring  can  be  applied  for  time  intervals  between  the  events; 
therefore,  the  ease  in  Figure  1.23  cannot  be  resolved  with  this  kind  of  statistics.  In 
reality,  the  distribution,  of  course,  must  be  exponential,  as  increasing  the  level  cannot 
lead  to  v  iolation  of  the  Poisson  distribution. 

Comparing  Figure  1.22  and  Figure  1.23  is  useful  as  it  shows  a  tendency  in  the 
behavior  of  the  curves  when  the  level  is  increased.  The  theoretical  distribution 
practically  coincides  with  the  curves  based  on  average  number  of  uperossings  per  unit  of 
time  and  on  censored  average  time  before  the  first  uperossing.  The  curves  based  on 
uneensored  average  time  before  the  first  uperossing  and  average  time  between  uperossing 
move  away  from  theoretical  distribution,  but  remain  close  to  each  other.  This  tendency 
can  also  be  seen  in  Figure  1.8  and  Figure  1.9. 

The  opposite  tendency  can  be  observed  when  the  crossing  level  is  lowered,  see 
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Figure  1.24.  All  five  curves  are  visually  closer  to  each  other  in  comparison  with  Figure 
1.21.  Appearance  of  the  histogram  also  suggests  the  exponential  character  of 
distribution.  The  confidence  intervals  for  the  estimates  of  the  mean  value  and  standard 
deviation  have  significant  overlap.  This  also  suggests  that  the  exponential  distribution  is 
possibly  applicable. 


-  Theoretical  Distribution  Density;  ^2=28.0  d=I8  P(y},d)~ 0.062 

-  Based  on  Average  Number  per  Unit  of  Time;  x2_34.9  d-1 7  P(X>d)=  0.0064 

-  Based  on  Average  Time  before  1  st  Crossing  yj-2 4. 1  d—  1 7  P(yj,cf)- 0. 1 2 

-  Based  on  Average  Censored  Time  before  1st  Crossing  x?  24.1  d=l7  P(x2.d)^OA2 

Based  on  Average  Time  between  Crossing  yj=22 .3  d  17  P{yj,d)-QM 

Figure  1.24.  Distribution  of  Time  Between  Upcrossings  For  a  Level  of  6.75  m  (1421  Crossings  Total, 

All  200  records  \\  ith  at  least  One  Crossing) 

Only  the  distribution  based  on  average  number  of  crossings  per  unit  of  time  is 
rejected  by  the  goodness-of-fit  test,  while  all  other  fits  are  supported  by  the  data.  The 
reason  can  be  seen  in  Figure  1.25,  where  all  the  parameters  are  compared.  First,  the  rate 
calculated  as  an  average  of  time  between  the  uperossing  still  is  different  from  the 
theoretical  value.  Apparently,  duration  of  the  record  is  still  insufficient.  The  rate  of 
uperossing  in  Figure  1.25  calculated  by  counting  includes  the  theoretical  solution  in  its 
confidence  interval,  but  the  middle  of  the  confidence  interval  happens  to  be  a  bit  lower 
than  the  theoretical  solution.  This  small  difference  was  enough  to  reject  the  distribution 
based  on  counting  of  events. 

Further  lowering  the  crossing  level  (5.75  m)  leads  to  an  unexpected  result:  none 
of  the  curves  fit  the  data,  however  all  the  curves  are  very  close  to  each  other,  see  Figure 
1 .26.  At  the  same  time,  estimates  of  mean  value  and  standard  deviation  have  substantial 
overlap.  In  addition,  the  lower  boundary  of  the  95%  confidence  interval  lies  around  100 
seconds.  This  is  still  enough  time  for  the  autocorrelation  function  to  die  out.  Detailed 
analysis  shows  that  the  absence  of  agreement  is  due  to  the  value  at  the  first  bin;  it  is 
noticeably  higher  than  expected.  Additionally,  the  value  at  the  second  bin  seems  to  be  a 
bit  lower.  This  may  suggest  some  sensitivity  to  w'idth  of  the  bin  that  was  calculated  with 
formula  (1.113)  so  far.  A  relatively  small  change  of  the  bin  width,  from  23.9  s  (Figure 
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1.26  upper)  to  30  s  (Figure  1.26  lower),  eliminates  this  effect.  All  of  the  curves  are 
supported  by  the  data  in  the  lower  histogram  shown  in  Figure  1.26. 
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Figure  1.25.  Comparison  of  Different  Methods  to  estimate  parameter  of  the  exponential  distribution 
(uperossing  rate).  Level  of  crossing  6.75  m,  1421  crossings  total,  all  200  records  with  at  least  one 

crossing 


Figure  1 .27  show's  histograms  for  the  crossing  level  of  5  m.  The  top  histogram  in 
the  figure  shows  the  histogram  with  bin  width  according  to  formula  (1.113);  similar  to 
the  previous  case,  all  the  curves  are  very  close  to  each  other,  but  none  of  them  fit  the 
data.  Now'  the  value  in  the  first  bin  is  very  small,  while  the  second  bin  gives  a  very  large 
number.  It  is  still  possible  to  make  curves  fit  by  a  manual  change  of  the  bin  width. 
However,  the  required  change  is  much  larger,  increasing  the  bin  width  from  12  s  (upper 
histogram  in  Figure  1.27)  to  36  s  (lowrer  histogram  in  Figure  1.27).  Setting  the  maximum 
time  to  400  s  was  also  needed  in  order  to  acheve  satisfactory  goodness-of-fit. 

At  the  same  time,  other  sy  mptoms  of  exponential  distribution  are  still  present;  the 
estimates  of  the  mean  value  and  the  standard  deviation  show  substantial  overlap  in  their 
confidence  intervals.  The  lower  boundary  of  the  confidence  interval  for  the  mean  value 
estimate  is  about  62  seconds,  still  enough  time  for  the  autocorrelation  function  to  die  out. 

The  first  bin  of  the  upper  histogram  in  Figure  1 .26  and  the  second  bin  of  the  upper 
histogram  in  Figure  1.27  correspond  to  the  value  of  10-15  seconds.  This  range  includes 
the  mean  period  of  the  stochastic  process.  This  may  indicate  that  the  reason  the 
histogram  dev  iates  from  the  exponential  distribution  is  the  presence  of  wave  groups.  The 
level  is  low  enough  so  that  successive  waves  in  a  group  all  cross  it.  As  these  wave 
groups  are  still  rare,  they  cannot  yet  break  the  Poisson  distribution  completely.  Some 
integral  characteristics  are  still  present.  At  the  same  time,  they  are  frequent  enough  to 
cause  a  local  distortion  of  the  exponential  distribution,  which  is  detected  by  the  ehi- 
square  goodness-of-fit  test. 
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Time  ,  s 


Wb=23.9  s 


Theoretical  Distribution  Density;  y2=64.4  d^29  -0.00017 

Based  on  Av  erage  Number  per  Unit  of  Time,  y2-64.8  d-28  P(y\c/)-  0.000096 

Based  on  Average  Time  before  1st  Crossing  y2^61.6  d=28P(x\c/)=0. 00025 

Erased  on  Average  Censored  Time  before  1st  Crossing  y2-61  6  d-28  P(y2,r/)=0.00025 

Based  on  Average  Time  between  Crossing  y“-56.2  d-28  P(x2a/)-0.0()12 


Wb-30  s 


Theoretical  Distribution  Density;  y2=32.9  d=23  P(x \c/)=0.084 

Based  on  Average  Number  per  Unit  of  Time;  y2^33  2  d-^22  P{x\d)-  0.059 

Based  on  Average  Time  before  1st  Crossing  y2-31.1  d-22  P(y\c/)-0„094 

Based  on  Average  Censored  Time  before  1  st  Crossing  y2-3 1 . 1  d-22  P(y\</)-4).094 

Based  on  Average  Time  between  Crossing  y:-25.7  d-22  P(y2,c/)-0.27 


Figure  1.26.  Distribution  of  Time  Between  Upcrossings  For  a  Le\el  of  5.75  m  (3269  Crossings  Total, 

All  200  Records  With  at  Least  One  Crossing) 
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Wb=  1 2  s 
Max(T)=549  s 


Theoretical  Distribution  Density;  1510  d-45  P(x\cf)-()A) 

Based  on  Average  Number  per  Unit  of  Time;  1530  d-44  P{x  \cf)  0.0 
Based  on  Average  Time  before  1st  (Tossing  %:-1452  d  ~44  P(y2*tf)  0.0 
Based  on  Average  Censored  Time  before  1st  Crossing  x  “1452  d-44  P(y  .if)  0.0 
Based  on  Average  Time  between  Crossing  y~  1487  d-44  P(x2<cf)-0.0 


Wb-36  s 
Max(T)  400  s 


-  Theoretical  Distribution  [density;  x2=1  1.4  d=l  1  P(y  ,</)-0.41 

-  Based  on  Average  Number  per  Unit  of  Time;  x2-l  7.6  d-10  />(x2^/)=  0.062 

Based  on  Average  Time  before  1st  Crossing  x:-14.6  d-10  P(y2.(f)-0. 15 

-  Based  on  Average  Censored  Time  before  1st  Crossing  y2  14.6  d  10  P(x .</)  0. 15 

Based  on  Average  Time  between  Crossing  x:-6. 12  d-10  P(x\cf)~ 0.80 

Figure  U27.  Distribution  of  Time  Between  Upcrossings  For  a  Level  of  5  m  (5407  Crossings  Total,  All 

200  Records  With  at  Least  One  Crossing) 

To  complete  the  study  of  influence  of  the  crossing  level,  two  more  distributions 
are  considered  here.  At  the  level  of  3  m,  shown  in  Figure  1.28,  the  estimates  of  the  mean 
value  and  standard  deviation  do  not  have  any  overlap  of  confidence  intervals.  Both 
confidence  intervals  are  very  small,  as  would  be  expected  from  a  sample  containing 
15,201  data  points.  The  mean  value  range  is  below  25  seconds,  where  the  autocorrelation 
function  has  not  entirely  decayed.  The  curves  are  separated  in  two  groups,  as  the  mean 
value  of  time  before  the  first  uperossing  is  apparently  no  longer  equal  to  mean  time 
before  the  events  nor  is  it  equal  to  the  inverse  of  the  event  rate. 
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Figure  1.28.  Distribution  of  Time  Between  Upcrossings  Fora  Level  of  3  m  (15201  Crossings  Total, 

All  200  Records  With  at  Least  One  Crossing) 

The  histogram  in  Figure  1.28  demonstrates  a  pronounced  peak  around  10-15 
seeonds;  its  character  is  obviously  not  exponential.  A  similar  picture  can  be  seen  when 
the  level  is  set  to  1  m,  see  Figure  1 .29.  The  separation  of  the  two  groups  of  curves  is 
even  larger,  while  the  shape  of  histogram  may  bear  some  resemblance  to  the  normal 
distribution. 


Figure  1.29.  Distribution  of  Time  Between  Upcrossings  For  a  Lesel  of  1  m  (25543  Crossings  Total, 

All  200  Records  With  at  Least  One  Crossing) 
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1.3.4.  Cumulative  Distribution  of  Time  between  Events 

The  derivations  (1.30-1.37)  prove  that  time  interval  between  uperossings  must 
follow  an  exponential  distribution  if  the  number  of  uperossings  during  a  fixed  time  is 
governed  by  the  Poisson  distribution.  However  an  attempt  to  use  statistics  for  the  time 
interval  between  uperossings  encounters  certain  difficulties.  Computing  the  upcrossing 
rate  calculated  as  the  inverse  of  the  average  time  between  uperossings  may  be  prone  to  a 
bias,  which  is  caused  by  insufficient  reeord  duration. 

Therefore,  it  is  desirable  to  have  another  method  to  cheek  if  the  assumption  of  a 
Poisson  distribution  is  still  valid,  that  would  be  free  of  drawbacks  described  above.  One 
sueh  method  was  described  by  Belenky,  et  ai,  (2007).  This  method  was  based  on 
estimating  the  probability  of  at  least  one  upcrossing  during  a  given  time  span. 

Consider  a  time  interval  from  a  time  instance  h,  until  /<.  with  the  duration  AT 
th.  The  probability  that  at  least  one  upcrossing  occurs  during  this  time  interval  is 
estimated  as: 


P‘(k^0)=\-- ^=2-  (1116) 

'  i< 

Here  k  is  number  of  uperossings,  A^-  o  is  a  number  of  records  without  a  single  upcrossing 
within  the  given  interval  and  Nr  is  total  number  of  records  available.  This  calculation  is 
illustrated  in  Figure  1.30,  where  uperossings  are  shown  as  black  dots.  For  the  example 
shown  in  this  figure,  the  probability  of  at  least  one  upcrossing  from  the  instant  th  to  the 
instant  t,  is  0.75.  Only  one  reeord  out  of  four  did  not  have  any  uperossings  during  the 
interval  T. 


Figure  1.30.  On  Estimation  of  the  Probability  of  at  Least  One  Event  During  Given  Inters  al  I 


This  estimate  tends  to  theoretical  probability  as  the  number  of  reeords  goes  to  infinity: 

/  \ 


Pr(k  *  0)  =  lim  (pj(k  *  0))=  lira 


V 


N 


17) 


K  / 


Assuming  a  Poisson  distribution  and  for  the  number  of  uperossings  leads  to  the  following 
expression: 
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lim 


N 

\  iyk=a 


N 


=  PT  (k  *  0)  =  1  -  exp(-  A.A T ) 


(1.118) 


R  J 


Here  X  is  the  theoretical  upcrossing  rate.  For  a  finite  number  of  records,  equation  (1.116) 
can  be  expressed  in  terms  of  the  estimate  of  the  rate  X*: 

N, 

(1.119) 


1 - =  l-exp(— X.*A  t) 

N  v 


Where  the  estimate  of  rate  X*  can  be  evaluated  as: 


X'  =  — —In 
A  T 


N, 


A 


Nr  / 


(1.120) 


As  it  was  shown  above  (see  formula  1.33),  expression  (1.1 17)  can  be  interpreted  as  an 
estimate  of  the  CDF  of  the  time  between  crossings  calculated  for  argument  AT: 


F(AT)  =  1  -cxp(-X.*A7’)=  1  - 


N, 


0 


N„ 


(1.121) 


Since  the  stochastic  process  is  considered  stationary,  the  theoretical  probability  Pz(k^O) 
is  not  time  dependent.  Therefore,  the  uncertainty  of  the  estimate  F  (AT)  can  be  reduced 
by  averaging  its  value  multiple  interv  als  of  size  AT: 

N,Jh=iAT,l„  =(/-U)A7f 


■V,r— I 


F  (A T)  = 


N 


AT  1=0 


Nc 


(1.122) 


Here  Nk=o(th,te)  is  the  number  of  records  that  did  not  have  any  uperossings  from  //,  till 
Nat  is  a  number  of  whole  intervals  AT  contained  a  record  length  Ts‘. 

T 

*  C 


Nat  = 


AT 


(1.123) 


Formula  (1.120)  is  an  estimate  for  the  cumulative  distribution  function  for  one  value  of 
AT.  To  calculate  the  rest  of  the  estimated  functions,  all  the  calculations  have  to  be 
repeated  for  an  array  of  points  7} 

{7\}=A7\2A7\3A7\..,  ./'AT,..,  Ts 


Finally,  the  estimate  of  the  CDF  at  points  7}  can  be  expressed  as: 

A,  Ah  =  HAT. I,  =  (i  +  \)jAT ) ' 


r=l±L 


I 

i- 0 


nd 


j  =  1,..,  N 


\T 


(1.124) 


(1  125) 


To  check  the  goodness-of-fit  of  the  exponential  CDF,  Belenky  et  al,  2007  used  the 
Kolmogorov-Smimov  test  (also  known  as  the  K-S  test)  for  goodness-of-fit..  The 
description  of  the  K-S  test  approach,  taken  from  Belenky  et  a /.,  2007,  is  provided  below 
for  convenience.  The  metric  for  the  goodness-of-fit  is  derived  from  the  absolute  value  of 
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the  maximum  difference  between  the  suggested  and  statistical  CDF. 


D  -  max 


(1.126) 


The  criterion  itself  is  expressed  though  the  maximum  difference  D: 

v  =  Dyfn  ( 1  127) 

Here,  n  is  the  number  of  data  points.  It  is  proven  that  if  the  statistical  estimate  of 
cumulative  distribution  function  is  evaluated  for  independent  data,  for  any  distribution 
F(x),  with  increasing  number  of  points: 


lim(v>^)=  I-  t(-l)*  cxp(-2A:v:)  (1.128) 

"  '  '  *=-!», 

In  praetiec,  an  upper  bound  of  10J  (instead  of  infinity)  for  the  summation  yields 
satisfactory  results.  Formula  (1.126)  yields  the  probability  that  the  difference  between 
the  observed  and  suggested  distributions  is  caused  by  random  reasons,  if  n  is  large 
enough. 


Prsd  =1~  I(-I)*exp(-2AV)  (1.129) 

k 

It  should  be  noted,  however,  that  the  K-S  test  does  not  account  for  the  number  of 
statistical  “degrees  of  freedom",  and  it  ts  only  sensitive  to  a  sample  size.  This  implies  that 
the  theoretical  distribution  must  be  suggested  on  the  theoretical  background  only  and  that 
it  should  not  contain  any  parameters  derived  from  the  statistical  sample. 

Belenky  et  at.,  2007  used  the  K-S  test  to  check  if  uperossings  follows  Poisson 
flow.  However  that  source  docs  not  contain  any  information  on  how  to  set  step  A 7'.  Here 
it  is  associated  with  the  bin  width  for  time  intervals  between  the  uperossings: 


AT  = 


(1.130) 


Here  <ir  is  the  standard  deviation  of  the  intervals  between  the  uperossing  and  Nt  is  then 
quantity. 

Figure  1.31  shows  the  statistical  CDF  calculated  with  formula  (1.123)  along  with 
the  theoretical  curv  e.  The  CDF  for  two  levels  of  crossing  are  shown  here,  7.5  m  and 
6.75  m.  Other  statistics  and  data  for  these  levels  arc  shown  in  Figure  1.3-Figure  1.5, 
Figure  1.21,  Figure  1.22  and  Figure  1.24  and  Figure  1.25  respectively.  The  K-S  test  did 
not  reject  the  hypothesis  of  an  exponential  distribution.  The  statistical  CDF  did  not  reach 
unity  for  the  level  7.5  m,  as  there  were  four  records  without  uperossings  at  all.  However, 
for  the  level  of  6.75  m,  all  the  records  had  at  least  one  uperossing,  so  the  statistical  CDF 
did  reach  unity. 
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Level  of  crossing  6.75  m 
1421  crossings  total 
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one  crossing 
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Hypothesis  passed 
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Figure  1.31.  Cuniulatixe  Distributions  of  Time  Between  Uperossings  for  Levels  of  7.5  m  and  6.75  in 


The  outcome  of  the  test  for  both  levels  7.5  m  and  6.75  m  is  not  surprising.  All 
analysis  carried  out  before  have  indicated  that  the  exponential  distribution  was,  in  fact, 
applicable.  As  it  can  be  seen  from  the  insert  in  Figure  1.21,  the  lower  boundary  of  the 
mean  value  is  around  340  s,  which  is  enough  time  for  the  autocorrelation  function  to  die 
out  (see  Figure  1.11).  The  overlap  between  confidence  intervals  of  the  mean  value 
estimate  and  standard  deviation  is  significant  (See  insert  in  Figure  1 .21 ). 

The  hypothesis  that  time  intervals  between  the  uperossings  have  exponential 
distribution  was  not  rejected  with  the  Pearson  chi-square  test  (probability  0.77,  see  Figure 
1.21).  This  hypothesis  was  also  not  rejected  by  the  K-S  test,  see  Figure  1.31.  The 
difference  between  these  tests  is  that  the  Pearson  chi-square  test  was  applied  to  the 
sample  of  time  intervals  between  the  uperossing.  The  Pearson  chi-square  test  has  shown 
that  the  exponential  distribution  with  theoretical  parameter  docs  not  fit  that  data 
(probability  is  only  0.0024,  see  Figure  1.21).  The  K-S  test  in  Figure  1.21  has  shown  that 
the  theoretical  distribution  fits  the  observed  data  (probability  is  0. 13). 

The  reason  for  this  difference  was  actually  already  explained  in  the  previous 
section.  It  is  a  statistical  bias  of  the  sample  of  time  intervals  between  uperossing.  This 
bias  is  introduced  by  absence  of  longer-than-record  intervals.  This  bias  decreases  the 
mean  value  of  the  time  between  the  uperossing  and  drives  the  statistical  estimate  of  the 
distribution  parameter  (rate  of  events)  up,  see  Figure  1 .22.  As  can  be  seen  from  this 
figure,  the  bias  is  absent  in  the  parameter  calculated  through  averaging  of  the  number  of 
uperossing  per  unit  of  time.  The  bias  also  can  be  corrected  by  censoring  the  estimate  of 
the  mean  time  before  the  first  uperossing.  Therefore,  rejection  of  the  theoretical 
distribution  in  Figure  1.21  does  not  constitute  rejection  of  the  hypothesis  of  exponential 
distribution  and  Poisson  flow  as  it  is  the  result  of  the  statistical  bias. 

This  bias  is  absent  in  the  distribution  (1.122)  as  it  is  essentially  based  on  counting 
uperossings  rather  than  on  calculating  the  time  interval  between  them.  The  absence  of 
the  bias  gives  the  CDF  (1.122)  an  advantage  over  a  histogram  of  time  intervals  between 
the  events.  Use  of  the  CDF,  however,  requires  using  the  K-S  test,  which  has  known 
limitations.  Strictly  speaking,  the  K-S  test  is  fully  applicable  only  if  parameters  of  the 
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fitted  curve  are  known  a  priori  and  do  not  come  from  statistical  estimates.  As  mentioned 
above  the  K-S  test  does  not  have  any  mechanism  to  penalize  the  result  for  using 
statistically  estimated  parameters.  If  these  parameters  have  been  used,  the  K-S  test  may 
overestimate  the  probability  that  the  difference  between  observed  data  and  the  fitted 
curve  caused  by  a  random  reason.  Using  the  K-S  test  in  this  example  is  justified,  as  the 
true  value  of  the  parameter  is  known.  Using  the  K-S  test  for  the  real  numerical 
simulation  or  experimental  results  where  all  the  parameters  are  statistical  estimates, 
therefore,  is  not  desirable  but  may  be  unavoidable. 

The  results  for  the  second  dataset  (the  left  curve  in  Figure  1 .3  1),  corresponding  to 
the  level  of  6.75  m,  are  completely  analogous.  The  analysis  of  the  time  intervals  between 
the  crossing  shown  in  Figure  1.24  as  well  as  the  insert  in  Figure  1.24  points  out  that 
Poisson  flow  is  applicable.  The  only  difference  is  that  all  the  records  have  at  least  one 
uperossing,  so  the  statistical  CDF  reaches  unity  in  Figure  1.31  and  uneensored  and 
censored  estimates  coincide  in  Figure  1.25. 

Figure  1.32  shows  results  for  crossing  levels  of  9  m  and  II  m.  Only  153 
uperossings  occurred  for  the  level  of  9  m  and  only  I  1 1  records  (out  of  200)  had  at  least 
one  uperossing.  The  bias  of  time  interval  between  the  crossings  was  so  large  that  none  of 
the  eurves  fit  the  data  in  Figure  1.23.  Also,  probably  for  the  same  reason,  there  was  no 
overlap  between  the  confidence  intervals  of  the  estimates  of  mean  value  and  standard 
deviation  in  the  insert  on  Figure  1.23.  As  it  was  noted  before,  this  outcome  can  only  be 
explained  by  the  influence  of  bias,  as  there  is  no  reason  why  the  exponential  distribution 
should  not  be  applicable  when  the  level  is  raised  and  uperossings  become  less  frequent. 
This  statement  is  supported  by  comparison  of  Figure  1.15  and  Figure  1.16,  where  time 
before  the  first  crossing  is  analyzed.  None  of  the  curves  fit  the  data  in  Figure  1.15. 
Figure  1.16  shows  agreement  with  the  eensored  data  for  eurves  based  on  theoretical 
parameters  as  well  as  based  on  two  statistical  estimates.  The  left  eurve  in  Figure  1.32 
shows  agreement  with  the  data  and  no  censoring  is  needed.  This  confirms  the  bias 
explanation  for  Figure  1.15. 
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Level  of  crossing  9  m 
153  crossings  total 
I  I  I  records  with  at  least 
one  crossing 
/>  0.0327 
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l  evel  of  crossing  I  I  in 
10  crossings  total 
10  records  with  at  least 
one  crossing 
/>  0.0055 
v  0.0174 
Am>=1.0 

Hypothesis  passed 


Figure  1.32.  Cumulative  Distributions  of  Time  Between  Uperossings  for  Levels  of  1 1  m  and  9  m 
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For  the  level  of  1 1m,  only  10  upcrossings  were  observed.  This  is  not  enough  data 
for  a  meaningful  histogram  of  time  intervals  before  the  first  upcrossing  nor  for  time 
between  the  upcrossings.  However,  the  methods  based  on  counting  number  of 
upcrossings  still  work.  Figure  1.7  shows  the  estimate  of  upcrossing  rate  based  on  average 
number  of  upcrossing  per  unit  of  time,  while  the  right  curve  in  Figure  1.32  shows  CDF 
for  time  between/before  the  upcrossing.  It  has  very  few  points,  but  this  was  enough  for 
the  K-S  test  to  be  used.  The  hypothesis  of  an  exponential  distribution  was  not  rejected  by 
the  K-S  test. 

Figure  1.33  shows  the  statistical  CDF  for  the  levels  5.75  m  and  5.0  m.  The 
hypothesis  of  an  exponential  distribution  is  not  rejected  by  the  K-S  test  in  either  case. 
Analysis  of  the  distribution  of  the  time  interv  als  between  the  upcrossings  delivered  mixed 
results.  Figure  1.26  shows  how  results  of  the  Pearson  chi-square  test  become  sensitive  to 
a  small  change  of  the  bin  width  for  the  level  of  5.75  m.  To  reach  a  similar  result  for  the  5 
m  level  the  “adjustments”  need  to  be  much  larger,  see  Figure  1.27.  At  the  same  time  all 
other  symptoms  of  exponential  distributions  arc  present:  substantial  overlap  of 
confidence  intervals  of  the  mean  value  and  standard  deviations  estimates  in  the  insert  of 
both  Figure  1.26  and  Figure  1.27.  Additionally,  the  distribution  of  time  before  the  first 
crossing  remains  exponential  for  a  level  of  5  m,  see  Figure  1.13  and  Figure  1.14. 

The  conclusion  made  in  the  previous  section  was  that  the  exponential  distribution 
was  rejected  (upper  parts  of  Figure  1.26  and  Figure  1.27)  because  of  the  values  in  the 
first  bin.  This  is  a  reflection  of  increased  cases  of  upcrossings  occurring  on  successive 
periods;  essentially,  it  is  an  influence  of  the  group  structure  of  the  stochastic  process.  As 
it  was  expected  in  the  previous  subsection,  the  local  distortion  of  the  histogram  did  not 
lead  to  rejection  of  the  exponential  distribution  based  on  analysis  of  the  statistical  CDF. 
However,  it  is  clear  that  both  levels  5.75  m  and  5.0  m  are  not  very  far  from  the  boundary 
of  applicability  of  the  exponential  distribution  and  Poisson  flow. 


Level  of  crossing  5  in 
5407  crossings  total 
All  records  with  at  least 
one  crossing 


0=0.0141 
v  1.039 
P  r\v=  0.23 


Hypothesis  passed 


Figure  1.33.  Cumulative  Distributions  of  Time  Between  Upcrossings  for  Levels  of  5.75  m  and  5  m 
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To  examine  when  and  how  the  breaking  of  Poisson  flow  can  be  detected  by  the 
K-S  test,  upcrossings  through  levels  of  4.75  m  and  3  m  were  analyzed  (see  Figure  1.34). 
For  the  3  m  level,  Poisson  flow  is  clearly  inapplicable,  as  it  can  be  seen  from  the  Figure 
1 .28;  the  histogram  of  the  intervals  between  the  uperossing  no  longer  has  the  appearance 
of  an  exponential  distribution  and  the  confidence  intervals  in  the  insert  do  not  overlap. 

The  results  of  the  K-S  test  shown  in  Figure  1 .34  clearly  reject  the  hypothesis  of 
an  exponential  distribution.  The  shape  of  the  statistical  CDF  is  different  from  that  of  an 
exponential  curve.  This  seems  to  be  a  natural  outcome  of  the  non-exponential  character 
of  PDF  in  Figure  1.28.  This  also  demonstrates  the  robustness  of  the  considered 
technique;  combination  of  CDF  (1.122)  and  K-S  test  has  spotted  inapplicability  of 
Poisson  flow  as  well  as  all  other  methods  with  exception  of  analysis  of  the  time  before 
the  first  crossing  (Figure  1.18). 


one  crossing 
D= 0.0272 
V  2.178 


n  0  1379 
v  17.0 
Prkp=0.0 

I  lypolhesis  did  not  pass 


7,  s 


20  40  60  80  100  120  140 


Figure  1.34.  Cumulative  Distributions  of  Time  Between  Upcrossings  for  Levels  of  4.75  m  and  3  m 

Another  important  result  shown  in  Figure  1.34  is  a  rejection  of  the  exponential 
distribution  for  the  level  of  4.75  m.  The  rejection  means  that,  according  to  K-S  test,  the 
boundary  of  applicability  of  the  exponential  distribution  is  somewhere  between  the  levels 
of  5  m  and  4.75  m.  This  is  in  agreement  with  the  previous  conclusion  that  the  levels  5.0 
to  5.75  m  are  not  very  far  from  that  boundary;  so  the  sensitivity  of  Pearson  chi-square  test 
to  the  bin  width  may  be  giving  a  hint  that  the  boundary  of  applicability  is  somewhere 
near. 


1.3.5.  Direct  Test  of  Applicability  of  Poisson  Flow 

The  CDF  formula  (1.122)  in  combination  with  a  K-S  test  seems  to  be  a  robust 
technique  for  checking  the  applicability  of  the  exponential  distribution  and  Poisson  flow 
However,  if  the  parameter  of  the  distribution  is  a  statistical  estimate,  the  K-S  test  may 
overestimate  the  probability  that  the  difference  between  the  observed  data  and  fitted 
curve  is  caused  by  random  reasons.  The  Pearson  chi-square  test  allows  a  penalty  to  be 
introduced  for  the  statistical  estimate  by  reducing  the  degrees  of  freedom  by  one,  and  is 
therefore  preferable. 
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The  applicability  of  Poisson  flow  ean  be  judged  direetly  by  ealeulating  a 
histogram  of  the  random  number  of  uperossings  observed  during  a  given  time  and  then 
cheeking  if  the  Poisson  distribution  fits  the  observed  data.  The  goodness-of-fit  then  ean 
be  judged  with  the  Pearson  chi-square  test. 

Formula  (1 .30)  gives  the  expression  for  the  Poisson  distribution  in  the  form  of  the 
probability  mass  function  (PMF),  as  the  number  of  upcrossing  is  an  integer  figure: 

PT (*)  =  ~  •  exp(- XTk)  (1-131) 

A-! 


Here  X  is  the  rate  of  events  that  is  estimated  statistically,  but  is  also  known 
theoretically  for  the  considered  example,  k  is  the  number  of  uperossings  observed  during 
time  Tk  ■ 

To  formulate  the  procedure,  time  7*  needs  to  be  chosen.  The  record  length  seems 
to  be  the  natural  choice;  however,  this  limits  the  size  of  the  sample  to  the  number  of 
reeords.  This  sample  size  may  be  not  sufficient  even  for  the  considered  numerical 
example  with  200  reeords.  If  this  technique  is  to  be  applied  for  a  model  test,  then  a 
smaller  window  has  to  be  introduced,  as  there  may  be  few  reeords. 


Consider  a  size  of  sample  Nk  that  is  the  total  number  of  time  windows. 

..  //AC  A/ 

Mk=—p—  (1.132) 

‘  k 

Here  Nr  is  s  number  of  reeords.  n  is  a  number  of  points  in  each  record.  At  is  the  time 
step.  The  duration  of  time  window  ean  be  conveniently  presented  as  a  fraction  of  the 
duration  of  the  record,  or  as  the  number  of  windows  per  record. 
nAl 

(1.133) 


*  r  = 


Tk 


One  of  the  properties  of  Poisson  distribution  is  the  mean  value  numerically  equals  the 
variance: 


rth 


=  K  =  XT', 


(1.134) 


Table  1  and  Table  2  eontain  the  ratio  of  estimates  of  mean  value  and  variance.  The  mean 
value  and  variance  of  the  Poisson  distribution  ean  be  statistically  estimated. 


/// 


1  ,v* 

:=7T^ 


K*'=ir- 


k  i= l 


(1.135) 


Here  k,  is  a  number  of  uperossings  that  was  observ  ed  in  the  window  /.  It  ean  be  shown 
that: 

m\=lCTk  (1.136) 

Where  X  is  an  estimate  of  rate  of  events  based  on  “counting”  -  the  average  number  of 
uperossings  per  unit  of  time  (1.46).  For  the  proof,  consider  an  auxiliary  random  variable 
U  defined  at  each  time  step  that  equals  one  if  there  is  an  uperossing  and  zero  if  there  is 
not  (1.38).  Without  a  loss  of  generality,  the  definition  of  this  auxiliary  variable  ean  be 
altered  by  the  introduction  of  counting  of  time  windows: 
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U,,u  = 


1  x,.u^anxi.jj*  \>a  . 


0  Otherwize 


i  =  \,..,NR  ;  7  =  1 . N„  /=  ! . A',.  (1.137) 


Where  /V/>  is  a  number  of  points  in  a  window: 

N 


Nr  = 


N» 


(1.138) 


Based  on  this  definition,  the  number  of  uperossings  in  a  window  can  be  expressed  as: 

V. 

<L139) 


/=! 


Then,  the  estimate  of  the  mean  value  of  the  number  of  uperossings  during  the  time 
windows  yields: 


'V*  i  I  'Vll  'Vtf  .1/1  /VM'yV*  /  I  ./=!  /-I 


1.140) 


Taking  into  account  that  the  two  internal  sums  represent  the  number  of  uperossings 
observed  during  one  record: 


v,  v. 


(1.141) 


H  w 


Substitution  off  1.139)  into  (1.138)  and  then  using  formula  ( 1.45) 


^  "  ;V 


(1.142) 


m, 


(1.143) 


Here  ni  is  a  mean  v  alue  estimate  of  the  auxiliary  random  variable  U.  It  is  related  to  the 

estimate  of  the  rate  of  events  with  formula  (1.46).  Substitution  of  (1.46)  into  (1.139) 
yields: 

m[  _  /’  Tr 
Here  Tr  is  the  duration  of  a  record; 

Th  =  nAf  (1.144) 

Substituting  (1.142)  into  (1.141)  and  taking  into  account  (1.129)  leads  to  the  expression 
(1.1 30)  and  this  completes  the  proof. 

While  numerical  proximity  of  estimates  of  the  mean  value  and  variance  may  be 
used  as  a  qualitative  indicator  of  possible  applicability  of  Poisson  flow,  a  chi-square 
goodness-of-fit  test  provides  a  more  rigorous  technique.  As  an  example.  Figure  1.35 
shows  details  of  these  results  for  the  level  6.75  m. 
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OOO  Theoretical  mass  probability  function  x2=3.79  d  6  P{y}.d)=  0.704 
OOO  Based  on  Average  Number  per  Unit  of  Time  x:=2.8l  d  =5  P(yj,d)-  0.728 
OOO  Based  on  Average  Time  between  Crossings  x'=30.2 1  d=5  P{y},d)~  0.000 1 34 
OOO  Based  on  Average  Censored  Time  before  1st  Crossing  x  6.69  d  5  P(x\cf)~  0.245 

Based  on  Average  Uneensored  Time  before  1st  Crossing  x:=6.69  d=5  Piyj.d)  0.245 


0J 


0,2 


Crossing  level  6.75  m 

Total  1421 ,  all  200  records  had  crossings 

Number  of  time  windows  per  record  .VM  5 

Duration  of  time  w  indow’  7*=360  s 

Volume  of  sample  1000 

Estimate  of  mean  value  mk *  1 .42 1 

Estimate  of  variance  Vk  =  1351 

Ratio  mk  /  Vk  =1.052 


Figure  135.  Probability  Mass  Function  of  the  Number  of  Uperossings  During  a  Time  Window 


As  in  the  previous  analysis,  five  different  methods  were  used  to  evaluate  the 
single  parameter  of  Poisson  distribution.  The  results  of  a  Pearson  chi-square  goodness- 
of-fit  test  for  all  five  methods  are  presented  in  Figure  1.35.  This  test  did  not  reject  the 
hypothesis  for  the  theoretical  distribution,  demonstrating  correct  interpretation  of  the 
theory  and  a  reasonable  choice  of  parameters.  This  is  consistent  w  ith  the  result  in  shown 
Figure  1.31. 

The  distribution  defined  using  the  parameter  estimated  as  the  average  number  of 
uperossings  per  unit  of  time  was  not  rejecting  either,  confirming  the  robustness  and 
reliability  of  the  “counting”  method.  This  is  consistent  with  the  results  shown  in  Figure 
1 .25,  where  the  “counting”  method  produced  the  number  closest  to  the  correct  answer 
known  from  theory. 

The  chi-squared  goodness-of-fit  test  rejected  the  distribution  based  on  average 
time  between  crossings,  most  likely  because  of  insufficient  record  length.  This  effect  can 
be  seen  in  both  Figure  1 .24  and  Figure  1 .25. 

As  all  the  records  had  at  least  one  crossing,  the  methods  based  on  uneensored  and 
censored  mean  time  before  the  first  uperossing  produced  identical  results  and  were  not 
rejected.  These  results  are  generally  consistent  with  the  previous  analysis,  see  Figure 


1.25. 


To  complete  the  formulation  of  the  procedure,  the  size  of  the  time  window  7*  has 
to  be  chosen.  As  the  number  of  uperossings  is  an  integer,  the  width  of  the  histogram  bin 
is  one  and  the  number  of  bins,  is  defined  by  a  maximum  number  of  uperossings 

observed  during  a  time  window. 

Obviously,  Nm ax  is  expected  to  be  larger  for  larger  values  of  7).  Therefore,  taking 
into  account  (1.129),  Nmax  is  expected  to  be  larger  for  smaller  sample  size  N^.  Naturally, 
the  constant  duration  of  the  time  window  for  different  levels  of  crossings  makes  a  very 
obscure  histogram  as  the  decrease  of  sample  size  leads  to  a  large  number  of  bins. 

Therefore,  it  makes  sense  to  keep  ;Vmax  relatively  constant  by  adjusting  the 
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duration  of  time  window.  This  leads  to  an  increase  of  hits  in  each  bin.  once  the  sample 
size  grows,  which  seems  more  natural.  The  number  of  windows  per  record  can  be  chosen 
to  satisfy  a  condition  for  a  constant  number  of  bins,  say: 

Wm„=7  (1-145) 

If  condition  (1.143)  could  not  be  met,  especially  for  a  small  number  of  crossings, 
the  number  of  windows  yielding  Nnlax  closest  to  7  is  chosen.  If  there  are  several  w  indow 
sizes  that  satisfy  (1.143),  the  largest  one  is  chosen.  Deviations  from  this  rule  should  be 
especially  noted  and  commented. 

It  is  known  that  results  of  Pearson  chi-square  goodness-of-fit-test  may  be 
sensitive  to  the  number  of  bins.  For  analysis  of  time  intervals  before  and  between  the 
upcrossings,  the  formula  for  the  width  of  a  bin  (1.1 10)  was  used.  The  condition  (1.143) 
used  to  size  the  histogram  is  somewhat  arbitrary;  therefore  a  sensiti\  ity  study  was  carried 
out.  The  level  is  6.75  m.  This  crossing  level  was  chosen  because  there  are  enough 
upcrossings  (1421)  and  the  tests  carried  out  previously  did  not  reject  the  hypothesis  of 
Poisson  flow.  The  windows  size  was  changed  systematically  and  the  results  are 
summarized  in  Table  I. 


Table  1.  Sensitivity  to  l  ime  Window  Si/e  for  the  Crossing  level  of  6.75  in 
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The  results  of  testing  four  distributions  are  placed  into  Table  1.  All  the  records 
did  have  at  least  one  uperossing,  so  there  is  no  difference  between  censored  and 
uncensored  data  for  the  mean  time  before  the  first  uperossing. 


The  hypothesis  of  a  Poisson  distribution  was  not  rejected  for  the  theoretical 
parameter  for  a  continuous  range  from  2  to  1 5  widows  per  record.  There  were  only  three 
window  sizes  for  which  the  probability  of  a  good  fit  was  less  than  the  significance  level 
of  5%  (1,25  and  40  windows  per  record).  However,  even  for  these  cases,  the  probability 
did  not  go  below  3%.  The  theoretical  criterion  for  any  goodness-of-fit  test  is  just  a 
finiteness  of  the  probability  that  the  difference  between  the  observed  data  and  hypothesis 
is  eaused  by  random  reasons.  Therefore,  variation  of  these  probabilities  from  93%  to  3% 
does  not  necessarily  constitute  sensitivity.  It  does  not  change  the  outcome  that  the 
hypothesis  is  not  rejected.  Therefore,  the  sensitivity  to  windows  size  is  very  small  when 
using  a  theoretical  parameter. 

A  similar  conclusion  ean  be  made  on  the  distributions  based  on  counting  and  time 
between  uperossings.  The  outcome  of  goodness-of-fit  analysis  does  not  change 
significantly  with  changing  window  size. 

The  situation  is  less  stable  for  the  distribution  based  on  time  before  the  first 
uperossing.  The  instability  can  be  explained  by  the  fact  that  this  estimate  uses  less 
statistical  information  then  the  others,  which  is  reflected  in  wider  confidence  intervals, 
see  Figure  1 .25. 

Once  insensitivity  to  window  size  has  been  demonstrated,  the  next  objective  is  to 
see  how  this  method  behaves  when  the  assumption  of  Poisson  flow  is  no  longer 
applicable. 

Table  2  contains  results  of  systematic  calculations  for  different  crossing 
thresholds.  The  table  includes  results  of  a  chi-squarc  test  eomputed  five  different  ways: 
theoretical  (1.81),  statistical,  based  on  counting  uperossings  (1.46),  statistical,  based  on 
mean  time  between  uperossings  ( 1 .96),  statistical,  based  uncensored  mean  time  before  the 
first  uperossing  (1.100)  and  statistical,  based  on  censored  mean  time  before  the  first 
uperossing  (1.1 03). 

Results  for  both  the  theoretical  distribution  and  the  distribution  based  on  counting 
indicate  the  applicability  of  Poisson  flow  above  the  level  of  6  m.  Levels  5.75  m  and  5.5  m 
show  some  sensitivity  to  windows  size  (see  Table  3  for  the  level  5.75  m).  The  area  of 
sensitivity  to  window  size  generally  corresponds  to  the  area  of  sensitivity  to  bin  size,  see 
Figure  1.26  and  Figure  1.27.  It  seems  plausible  that  sueh  “grey”  areas  are  indicators  that 
the  independence  of  uperossings  is  about  to  be  violated  and  Poisson  flow  will  bceomc 
inapplicable  soon  (or  is  inapplicable  already).  The  inapplicability  of  Poisson  flow  is 
indicated  consistently  for  all  the  levels  below  5.5  m.  The  boundary  between  applicability 
and  inapplicability  determined  by  this  method  is  a  little  higher  than  one  evaluated  using 
the  K-S  test  (between  5  m  and  4.75  m). 

The  distribution  based  on  time  between  uperossings  was  rejected  for  all  the  levels. 
The  sample  of  time  intervals  between  the  crossings  did  not  produce  representative 
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statistical  estimates  for  the  accepted  calculation  parameters  (number  and  length  of  record, 
time  step,  etc.).  As  it  can  be  seen  form  Figure  1.9.  Figure  1.22,  and  Figure  1.25,  the 
length  of  the  record  was  too  small  to  estimate  the  rate  of  events  correctly  for  the  levels  of 
9  m,  7.5  m  and  6.75  m,  respectively.  As  a  result,  a  Poisson  distribution  based  on  this 
estimate  was  rejected.  The  length  of  records  seems  to  be  barely  enough  for  the  level  of  5 
m.  see  Figure  1.8,  but  this  level  was  too  low  to  assure  independence  of  uperossing.  and  as 
a  result  Poisson  flow  was  not  applicable  and  the  distribution  was  rejected  again 


Table  2.  K\aluation  of  applicability  of  Poisson  flow 
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*  was  manually  chosen  to  see  if  the  hypothesis  can  pass  Possible  sensiliv  it>  to  number  of  windows 


The  distribution  based  on  uncensorcd  time  before  the  first  uperossing  was  not 
rejected  for  the  level  of  7  m  and  6.75  m.  It  was  rejected  for  the  levels  of  6.5  m  and  6  in, 
probably  because  of  insufficient  accuracy  of  the  estimate.  For  the  level  of  5.75  m  and 
below,  Poisson  flow  may  be  already  inapplicable.  For  the  levels  above  of  7  m  the 
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hypothesis  was  rejected,  because  of  bias  in  the  estimate  caused  by  insufficient  length  of 
record,  see  Figure  1.9  and  Figure  1.22. 

The  distribution  based  on  censored  mean  time  before  the  first  upcrossing, 
however,  is  not  rejected  up  to  the  level  of  10  m.  This  is  not  surprising  as  the  censoring 
procedure  takes  care  of  bias  in  the  parameter  estimate,  see  Figure  1.8  and  Figure  1.22. 
The  rejection  of  the  level  below  6  m  is  caused  by  the  same  reasons  as  the  rejection  of  the 
distribution  based  on  the  uncensored  estimate. 


Table  3.  Sensitivity  to  Time  Window  Si/e  for  the  Crossing  level  5.75  m 


1  .H 

(A 

Results  Pearson  chi-square  goodness -of- fit  test  for  distribution  based  on 

Theory 

Counting 

Time  between 
crossings 

Time  before  the 
lsl  crossing 

r 

P(x2<f) 

x2 

7 

r 

P(r.d) 

7 

i: 

P(y'M) 

1 

1800 

27 

200 

2.29 

55.07 

7.44E-04 

55.08 

4.81E-04 

64.55 

2  40E-05 

77.61 

2.69E-07 

2 

900 

16 

400 

1  43 

22.45 

9.65E-02 

22.48 

6.93E-02 

31.01 

5  50E-03 

43.29 

7.70E-05 

3 

600 

14 

600 

1.34 

25.1 

2.24E-02 

25.14 

1.42E-02 

33.25 

8.86E-04 

45.51 

8.43  E-06 

4 

450 

11 

800 

1.13 

9.74 

4.63E-01 

9.8 

3.67E-01 

17.99 

4.00E-02 

30.9 

3.08E-04 

5 

360 

11 

1000 

1.14 

16  82 

7.86E-02 

16.84 

5.13E-02 

26.74 

1.54E-03 

40.39 

6.45E-06 

6 

300 

10 

1200 

1.15 

21.05 

1.24E-02 

21  08 

6.93E-03 

3041 

1.79E-04 

43.75 

6.33E-07 

7 

257 

11 

1400 

1.07 

12.54 

2.50E-01 

12.6 

1  8  IE-01 

21.56 

1  OOE-02 

35.45 

4.96E-05 

8 

225 

9 

1600 

1.06 

5.1 

7.47E-01 

5.1 1 

6.46E-0I 

16.16 

2.00E-02 

30.79 

6.78E-05 

9 

200 

9 

1800 

1.11 

11.61 

1.70E-01 

11.64 

1.13E-01 

21.31 

3.34E-03 

35.07 

1  09E-05 

10 

180 

8 

2000 

1.07 

6.65 

4.66E-01 

6.68 

3.52E-01 

0  3519 

8  3 1 E-03 

31.66 

1 .90  E-05 

1  1 

163.5 

7 

2200 

1.05 

5.05 

5.38E-01 

5.09 

4.05  E-01 

14.67 

1  OOE-02 

28.62 

2.75E-05 

12 

150 

8 

2400 

1.06 

11.05 

1.36E-01 

11.09 

8.57E-02 

20.98 

1  85E-03 

35.08 

4. 1 6  E-06 

13 

138.5 

7 

2600 

1.08 

9.85 

1 .3  IE-01 

9.88 

7.87E-02 

19.8 

1.36E-03 

33.76 

2. 6  5  E-06 

14 

128.5 

8 

2800 

1.06 

11.18 

1  31E-01 

11.23 

8.16E-02 

20.84 

1  96E-03 

34.84 

4.64E-06 

15 

120 

6 

3000 

1.05 

5.78 

3.29E-0I 

5.82 

2.13E-01 

15.53 

3.72E-03 

29.53 

6. 09  E-06 

16 

1  12.5 

7 

3200 

1.03 

5.32 

5.03F-01 

5.34 

3.76E-01 

16.76 

4.98E-03 

31.74 

6.69E-06 

17 

106 

7 

3400 

1.06 

13.81 

3  18E-02 

13.83 

1  67E-02 

24.57 

1 .69E-04 

39.08 

2.29E-07 

18 

100 

6 

3600 

1.03 

17,93 

3.04E-03 

17.97 

I.25E-03 

27.86 

I.33F.-05 

42.17 

1.54E-08 

19 

44.5 

7 

3800 

1.05 

19.52 

3.36E-03 

19.58 

1  50E-03 

28.63 

2.74E-05 

42.44 

4.79E-08 

20 

90 

6 

4000 

1  01 

7.45 

1.89E-01 

7.47 

1.13E-01 

18.65 

9.2  IE-04 

33.65 

8.78E-07 

21 

85.5 

6 

4200 

1.06 

24.55 

1.70E-04 

24.62 

6.01E-05 

3296 

1  22E-06 

46  31 

2.13E-09 

22 

82 

6 

4400 

1.03 

21.9 

5.46E-04 

21.93 

2.07E-04 

32.84 

1  29E-06 

47.69 

1  09E-09 

23 

78.5 

6 

4600 

1,03 

692 

2.27E-01 

6,92 

1  40E-01 

18.98 

7. 94  E-04 

34.32 

6.40E-07 

24 

75 

6 

4800 

1.05 

12.58 

2.77E-02 

12.61 

1.33E-02 

22.73 

1  43E-04 

36.96 

E83E-07 

25 

72 

6 

5000 

1.02 

13.59 

1  85E-02 

13.61 

8.63E-03 

24.31 

6.91  E-05 

39.05 

6.79E-08 

30 

60 

5 

6000 

1.03 

8.71 

6.87E-02 

8.75 

3.28E-02 

18.69 

3.17E-04 

32.9 

3.37E-07 

40 

45 

4 

8000 

1  04 

3.79 

2  85E-01 

3.85 

1.46E-01 

12.47 

1  96E-03 

25.9 

2.38E-06 

50 

36 

4 

10000 

1.04 

17.53 

5.51E-04 

17.6 

1.51  E-04 

25.51 

2.89E-06 

38.66 

4.03E-09 
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1.4.  Summary 

The  probability  of  a  large  roll  event  (partial  stability  failure)  is  related  to  exposure 
time.  This  probability  grows  with  time.  The  time  before  a  large  roll  event  is  a  random 
number. 

If  sequential  large  roll  events  can  be  considered  independent,  the  number  of  such 
events  during  a  given  time  follows  a  Poisson  distribution  and  the  time  before  a  large  roll 
event  (or  between  them)  has  an  exponential  distribution. 

Both  Poisson  and  exponential  distributions  share  a  single  parameter  that 
completely  defines  both  distributions.  This  parameter  is  the  rate  of  events  (average 
number  of  events  per  unit  of  time)  and  is  equal  to  the  inverse  the  mean  time  before  or 
between  the  events. 

If  the  distributions  of  instantaneous  roll  angle  and  roll  rate,  are  known,  the  rate  of 
events  can  be  found  using  uperossing  theory. 

Large-amplitude  roll  motion  is  the  response  of  a  dynamical  system  with 
significant  nonlinearity.  Even  if  the  excitation  of  such  a  system  has  a  normal  distribution, 
the  response  can  be  significantly  non-Gaussian.  As  reliable  modeling  of  the  distribution 
of  roll  angle  may  be  difficult,  statistical  evaluation  of  the  rate  of  events  is  of  practical 
interest. 

Three  methods  of  statistical  evaluation  of  the  rate  of  events  were  considered.  The 
first  method  is  based  on  counting  the  uperossing  events  and  estimating  an  average 
number  of  events  per  unit  of  time.  The  second  method  w  as  based  on  average  time  before 
the  first  event  occurs,  while  the  third  method  involved  time  estimation  of  average  time 
between  events.  Evaluation  of  the  confidence  interval  was  included  with  all  three 
methods. 

A  numerical  example  was  formulated  to  examine  how  these  methods  work 
Simulated  wave  elevations  were  chosen  to  serve  as  the  data  set  for  this  example.  Their 
distribution  is  known  to  be  normal.  Therefore,  the  theoretical  value  for  the  rate  ol  events 
or  uperossing  rate  is  available.  These  methods  can  therefore  be  judged  based  on  how 
close  the  results  come  to  the  theoretical  answer. 

A  different  degree  of  rarity  of  the  uperossing  events  was  modeled  by  varying  the 
crossing  level;  high  crossing  level  leads  to  fewer  uperossings,  so  the  events  become  rarer 
and  the  methods  can  be  tested  for  different  conditions. 

It  was  found  that  the  methods  based  on  time  before  and  between  the  events  may 
be  biased  due  to  insufficient  record  length.  The  method  based  on  counting  does  not  have 
this  problem.  This  bias  for  the  method  based  on  average  time  before  the  first  uperossing 
can  be  corrected  by  censoring. 

The  counting  method  was  found  to  be  preferable  as  it  not  biased  and  has  less 
statistical  uncertainty  in  comparison  with  the  method  based  on  censored  mean  value  of 
time  before  the  first  event. 

Several  methods  were  considered  for  checking  if  uperossing  events  follow 
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Poisson  flow  and,  therefore,  if  the  exponential  distribution  can  be  used  to  compute 
probability  of  at  least  one  event  during  given  exposure  time.  All  of  these  methods  used  a 
goodness-of-fit  test  to  check  if  the  distribution  of  the  observed  data  follows  either  an 
exponential  or  Poisson  distributions.  These  methods  were: 

1.  Check  if  statistical  PDF  of  time  before  the  first  event  is  exponential  using  Pearson 
chi-square  goodness-of-fit  test; 

2.  Check  if  statistical  PDF  of  time  between  the  events  is  exponential  using  Pearson 
chi-square  goodness-of-fit  test; 

3.  Check  if  statistical  CDF  of  time  between  the  events  is  exponential  using 
Kolmogorov-Smimov  goodness-of-fit  test; 

4.  Check  if  statistical  probability  mass  function  (PMF)  of  number  of  events  during 
given  time  follows  Poisson  distribution. 

The  numerical  example  was  also  used  to  test  all  these  methods.  The  theoretical 
distribution  available  for  the  numerical  example  was  used  to  check  if  the  calculation 
parameters  (number  and  length  of  record,  time  step,  etc.)  were  selected  correctly  so  the 
results  can  be  decisive. 

The  methods  2,  3  and  4  were  found  to  be  able  correctly  detect  violation  of  Poisson 
tlow  caused  by  dependence  of  the  uperossing  events.  However,  method  2  uses  a  biased 
distribution  and  method  3  may  overestimate  the  goodness-of-fit  if  the  statistical  estimates 
were  used  for  parameters  of  the  distribution.  Method  4  is  therefore  preferable. 

In  conclusion,  two  techniques  arc  chosen  for  the  procedure: 

•  The  rate  of  events  is  to  be  estimated  as  an  average  number  of  uperossings  per  unit 
of  time  -  the  “counting”  method 

•  The  applicability  of  Poisson  flow  is  to  be  checked  using  a  Pearson  chi-square 
goodness-of-fit  test  applied  to  the  statistical  PMF  of  the  number  of  uperossings 
during  a  given  time. 
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2.  Extreme  Value  Theory 


2.1. Background 

Gumbel  ( 1 958)  formulated  Extreme  Value  Theory  (EVT)  in  its  modern  form.  One 
of  the  immediate  applications  of  EVT  was  the  prediction  of  extreme  flooding  based  on 
multi-year  observations.  There  is  a  series  of  measurements  of  the  water  level  in  a  river 
observed  during  a  year.  Taking  the  largest  measurement  for  each  year  a  series  of  extreme 
values  is  created.  The  basic  question  posed  was  then.  "What  would  be  the  level  for  a 
one-hundred-year  flood?" 

Extreme  value  theory  looks  at  another  aspect  of  the  problem  of  rare  events  While 
the  Poisson  distribution  (considered  in  detail  in  the  previous  section)  answers  how  many 
rare  events  could  occur  in  a  given  time  period,  EVT  looks  into  the  distribution  of  the 
magnitude  of  the  rare  events. 

The  concept  of  order  statistics  is  another  mathematical  tool  that  is  related  to 
extreme  value  theory.  Order  statistics  are  review'd  briefly  in  the  next  subsection. 

2.  /.  /.  Distribution  of  Order  Statistics 

As  order  statistics  can  be  applied  to  both  roll  angle  and  time  before  or  between 
large  roll  ev  ents,  the  following  text  uses  the  generic  nomenclature  A'  for  observ  ations  of  a 
continuous  random  variable  .v.  The  following  represents  a  generic  derivation  that  could 
be  found  in  a  number  of  statistical  textbooks.  Similar  to  derivations  in  the  previous 
section,  it  has  been  placed  here  for  the  sake  of  completeness. 

Consider  a  series  of  n  independent  observations  of  a  continuous  random  variable 

A'j,  A: . X„.  The  largest  (or  the  smallest)  observation  out  of  n  is  also  the  random  variable 

with  a  distribution  that  is  different  from  the  distribution  of  x. 

The  set  of  values  X\,  AA . A',,  is  sorted  (ordered)  from  smallest  to  largest,  so  that 

A'd)  is  the  smallest  observed  value  while  A'j,,)  is  the  largest.  The  value  A'j*),  which  is  A-th 
from  the  smallest,  is  defined  as  the  A"-order  statistic.  The  objective  is  to  find  the  PDF  of 
A-order  statistic  ./(’*)(.r). 

The  PDF  is,  by  definition,  the  derivative  of  the  CDF: 


fa  ,(*) 


dx 


(2.1) 


The  cumulative  distribution  function  (CDF)  of  the  A  -order  statistic  is  defined  as 
the  probability  that  the  encountered  value  of  the  A'y,-order  statistic  is  less  than  an 
argument  of  this  function. 

F[t+x)=P(X{ki<x)  (2.2) 


However,  if 
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k*+n  -*1^  l^(*l  -  -Yl 


(2.3) 


Because  values  X(„  are  sorted  in  aseending  order  (due  to  definition  of  order  statistic),  so 


<*+n 


(2.4) 


The  converse  is  not  true:  if  X^)<x  it  not  neeessarily  mean  that  it  is  also  larger 
than  X{k\  i).  The  following  expression  is  biconditional,  but  it  is  for  k  =  n—  1  only 


{A-„1,<.r|U|Aa,<A-<A',„1)o{xil,<x]  ;  k  =  n-\ 
Formula  (2.5)  ean  be  generalized  for  any  value  of  k 

k„>  *  -v}ui  uko  *  -v  <  *<*.,}] ~  k*>  *  *) 


(2.5) 


(2.6) 


,  i-k 


Formula  (2.6)  expresses  all  possible  ways  how  the  condition  A’i<t)<x  can  be  fulfilled. 
The  CDF  of  A-order  statistics  ean  be  expressed  by  substitution  of  (2.6)  into  (2.2) 


/•;„<*) =4r„,s*)=/*  k„<*)u[ Uk„  **<*■»„} 


\X 


./=* 


(2.7) 


J) 


Equation  (2.7)  is  a  probability  of  a  union  of  random  events  where  the 
corresponding  conditions  are  true.  As  it  can  be  clearly  seen  from  the  equation  (2.7)  all  the 
conditions  are  incompatible  and  the  probability  of  simultaneously  oeeurring  random 
events  is  zero.  As  it  is  known,  the  probability  of  the  union  of  incompatible  events  equals 
just  a  sum  of  probabilities: 


/•„(*)=/>(*,.,  <.v)+24v„,  £*<  X . )  (2.8) 

i-k 

Consider  the  first  component  P( X(ll)<x).  It  is  a  probability  that  the  argument  is 
greater  than  the  »-order  statistic,  which  is  the  largest  observed  value.  As  there  were  n 
observations  total,  the  condition  P(X<x)  has  to  be  satisfied  n  times: 

p(xin  s  x)  =  (P(X  <  x)y  =  {F(x)Y  (2.9) 

The  component  under  the  symbol  of  summation  in  expression  (2.8)  represents  a 
probability  that  x  will  exceed  a  value  that  has  been  seen  in  /  observations,  but  is  less  than 
any  of  the  values  encountered  in  n-i  observations: 
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p(xu)  <  x  <  A"(i+I))=  C(n,i){P(X  <  .v ) )' ( P( X  >  x))"  ' 


(2  10) 


Here  C(n,  i)  stands  for  a  number  of  distinct  variants  where  x  could  exceed  values 
that  have  been  seen  in  /  observations,  but  not  exceed  the  rest  //-/  observations.  It  is  a 
number  of  combinations  of  how  /  values  can  be  chosen  out  of  n. 


It  is  not  difficult  to  recognize  that  expression  (2.10)  is,  in  fact  the  binomial  distribution 

/>(U,„<  (<*,„,,))=  2 

p  =  P(X  <  v)  ;  q-P{  X  >  x)  =  \-  P( X  <  x)  -  1  -  /; 

By  definition,  it  is  the  CDF  of  random  variable  v, 

p  =  P(X  <  .v)  =  F(x)  (2  13) 

Substitution  of  formulae  (2.0),  (2  12)  and  (2. 1 3)  into  the  expression  (2.8)  yields: 

F{i)(x)  =  (F(.v))"  +YjC(n,i){F(x))'(\-  F(x))"  ‘  (2  14) 

/-< 

The  first  term  in  the  expression  (2.14)  can  be  presented  in  the  following  form: 

(F(x))"  -  CM(F(x)T0  ~  F(x)f  =  C(n.liF(x)h\-  F(x)Y  1  (2  15) 

l  H 

Then,  it  can  be  incorporated  into  the  sum.  This  leads  to  the  following  expression  for  CDF 
of  A-order  statistic: 


Pa>(x)  Y.C(nJ)(F(x))'(\-F(x))"-‘ 


(2  16) 


i  k 


Substitution  of  (2. 16)  into  (2. 1 )  leads  to  the  PDF  of  A-order.  Derivative  of  the  sum. 


dxti 

=  V-^C(/;./)(F(  v))'(l-F(.v)r  ' 
,  i  dx 


(2  17) 


Appling  the  product  rule  of  differentiation: 
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(2.18) 


ftk)(x)  =  £  C{nA  (1  -  F{x)r  -f  ((F(x)y )+  {F(x)J  4(d  -  F{x))" 
V  dx  ax 


i=k 


Considering  each  of  the  derivatives,  using  the  chain  rule  ot  differentiation: 


((Fwy)=/-(fwr-/w 


d_ 
dx 

4(0 - 'W  0= ("-0-0 - nx)T 1 '  (-/(o) 

dx 


(2.19) 


Substitution  of  (2. 1 9)  into  (2.18)  yields: 

fa)(x)  =  f(x)^C(n,i)(i{F(x)J  \\-F(x))"-‘ -(n-iiF(x))(\-F(x)y  '  ')  (2.20) 


/-A 


Expand  the  formula  (2.11)  for  a  number  of  combinations  C(n ,  /)  and  consider  each 
component  of  (2.20)  separately: 


C(*MF(x)1  '(1-^(0)" '  =—!!—i(F(x))‘  '(1  -F(x))" 1  = 


m 


(/-!)!(»-/)! 
(w-l)!« 


/!(«-;)! 


(2.21) 


(/-I)!((«-l)-(/-l))! 

~  C{n  —  \,i  —\)n(F{x)Y  '(1  — /'  (a'))” 


{F(x))1  \\-F(x))n  1  = 


C(n,i)(n-i){F(x))(\  -  F(x))n  '  1  =  -  (n-i){F(x))(\-F(x)Y  '  1 

=  "!  , jFw)'(i-fwrH- 

/!(/?  —  /  — I)! 

/!((«  —  1)  —  /)! 

=  C(n  -  \J)n(F(x))‘ (\  -  F(x))"  '  1 
Substitution  of  (2.2 1 )  and  (2.22)  back  into  (2.20)  yields: 


(2.22) 
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(2  23) 


/;*,(*) =«/(*)  xc(w-i,/-o(/r(-v))i  'Q-F(X)r‘- 


.  l-k 


C(n-UiF(.x)y(l-F(x))ml 


h-k 


Consider  the  first  sum  in  the  expression  (2.23);  its  expansion  looks  like; 

XC(/;-1./-1)(C(.v))m(1-F(v))'"  = 

;  k 

=  C(n-lk-\){F(x)Y  \\-F(.x))"k  + 

+  C(„-lk){F(x)Y{\-F(x))"k  '  + 

+  C(n-lk  +  \){F(x)Y'(\-F(x)yk-  + 

+  ...+ 

+  C(n  -  \,n  -  \)(F{x))"  ’(1  -F(x)Y 

Consider  the  first  sum  in  the  expression  (2.24);  its  expansion  looks  like: 
XC(/;-l,/)(F(.v))'(l-F(.v))"'  1  = 


(2  24) 


i*M 


(2  25) 


+  C(n-lkinx)V(\-F(x)Ti  '  + 

+  C(n -\,k  +  \){F(x)Y'\\~  F(x))n  *  :  + 
+  C(n- 1, A'  +  2)(/r(.v))*  :(1  -  F(x)Y  *  '  + 

+  C(n-  \,n)(F(x))"(\  -  F(x))  1 


Note  that  the  second  term  in  the  expanded  sum  (2.24)  is  identical  to  the  first  term 
in  the  expanded  sum  (2.25),  while  the  third  term  of  (2.24)  is  equal  to  the  second  term  of 

(2.25)  and  so  forth.  As  the  expression  (2.23)  is  a  difference  between  sum  (2.24)  and 

(2.25) ,  only  the  first  term  of  (2.24)  and  the  last  term  of  (2.25)  survive. 


fa)(x)  =  nf(x)(C(n-lk-\){F(x)Y  '(1  -F(x))"  k  - 
-C(n-  \ji)(F(x))"(\  -  F(x))  ') 


(2  26) 


Consider  the  second  term  in  the  equation  (2.26),  the  coefficient  there  expresses  the 
number 
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C(n  — 1,«)  =  0 


(2.27) 


As  a  result  the  second  term  in  (2.26)  equals  zero.  Finally  the  PDF  of  the  A'-ordcr  statistic 

is: 

f,„M  =  f(x)n  .w  ,JF(jr>>‘  '(>-«*))■ '  (2.28) 

(k  —  \)\(n  —  k)\ 


2.1.2.  Extreme  Value  Distributions 

In  general,  the  distribution  of  extreme  values  is  a  particular  case  of  distribution  of 
order  statistics  (Gumbel,  1958). 

Consider  a  set  of  independent  identically  distributed  random  values  J.Y|,...,t„J  the 
limiting  cumulative  distribution  has  been  shown  to  be  of  the  form:  (Davison,  2003): 


p(x>xm)=fcev(x) 
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exp 
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l  +  y 

a  J 
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(2.29) 


Here  0  is  a  location  parameter,  a  is  a  seale  parameter  and  y  is  a  shape  parameter. 

This  is  the  Generalized  Extreme  Value  (GEV)  distribution  and  holds  for  the 
maxima  of  observed  values  of  x,  regardless  of  how  x  is  distributed  itself.  The  parameter  y 
is  often  referred  to  as  the  extreme  value  index  and  controls  the  behavior  of  the  upper  tail 
of  the  distribution.  A  trio  of  extreme  value  distributions  arises  as  special  eases  of  the 
GEV  distribution  depending  on  the  value  of  y.  These  distributions  are  the  Gumbel, 
Freschet,  and  Weibull  distributions. 

A  Gumbel  or  Type  1  distribution  arises  when  y  equals  or  approaches  zero: 


Fw(x)  =  exp 


-exp 


x-e 


a 


J) 


(2.30) 


For  the  positive  values  of  the  shape  parameter  y.  a  Freschet  or  Type  11  distribution  arises: 
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X  >  0 


Here  k  is  a  positive  shape  parameter. 


(2.31) 
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.V  <  0 


(2.32) 
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l  «  )  J 

,v  >  0 


Here  k  is  a  positive  shape  parameter. 


2. 1.3.  Application  Extreme  Value  Distributions:  State-of-the-Art  Review 

Extreme  Value  Theory  can  be  found  in  many  applications  in  Naval  Architecture 
A  typical  application  is  estimating  the  maximum  lifetime  wave  loads  on  a  ship  hull  using 
Weibull  distribution.  MeTaggart  pioneered  the  application  of  extreme  value  distributions 
for  the  problem  of  assessing  the  dynamie  stability  of  ships. 

MeTaggart  (2000,  2000a,  2000b),  MeTaggart  &  de  Kat  (2000)  focused  on  fitting 
extreme  value  distributions  to  roll  maxima  for  predieting  the  hourly  capsize  risk  of  a 
naval  combatant  in  a  stationary  seaway.  He  investigated  the  use  of  the  several 
distributions  for  fitting  the  roll  maxima  generated  from  simulations  using  the  time 
domain,  ship  motions  eode  FREDYN.  The  following  extreme  distributions  were 
investigated: 

•  Generalized  Extreme  Value 

•  Freschet,  referred  to  as  a  Type  11  Maximum  Distribution 

•  Gumbel 

•  Gumbel  Limited  Range  (GLR) 

•  Transformed  Gumbel 


MeTaggart  found  that  is  was  often  difficult  to  obtain  a  satisfactory  distribution  fit 
to  the  roll  maxima.  Since  the  prediction  of  capsize  risk  was  the  goal,  the  Gumbel  1  united 
Range  (GLR)  fitting  technique  was  developed.  The  distribution  fit  used  a  least  squares 
method  to  the  empirical  CDF  which  he  computed  as: 


N  + 


(2.33) 


In  applying  the  least  squares  method,  only  the  error  in  the  upper  portion  of  the  sample  of 
roll  maxima  is  considered.  Through  this  method,  the  hope  was  that  this  partial 
distribution  fit  was  useful  for  extrapolating  to  the  roll  angle  that  w'as  considered  for 
capsize. 

The  problem  of  poor  distribution  fit  that  MeTaggart  was  attempting  to  solve  with 
the  Limited  Range  approach  eaused  the  extreme  non-linearity  of  the  motion  of  a  ship 
around  the  peak  of  the  ship’s  righting  ami  (stiffness)  curve.  The  ship  starts  responding 
differently  once  the  roll  angle  approaches  or  exceeds  this  value  (to  complicate  matters, 
this  point  actually  fluctuates  in  a  seaway).  This  change  in  system  dynamics  calls  into 
question  one  of  the  fundamental  assumptions  of  Extreme  Value  Theory,  that  the  data  is 
Independent  and  Identically  Distributed  (II D).  If  a  sample  of  roll  maxima  has  values 
larger  and  smaller  than  the  roll  angle  where  the  righting  arm  peaks,  the  data  may  not  be 
identically  distributed.  By  considering  the  upper  portion  of  the  data  for  the  least  squares 
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fitting,  the  Limited  Range  approach  attempts  to  deal  with  this  issue;  however,  the 
empirieal  CDF  being  fit  still  contains  all  of  the  data. 

In  the  application  of  the  above  approach,  generally  30-minute  simulation  runs 
were  executed.  The  desired  output  from  the  proeess  was  hourly  eapsize  risk.  The 
follow  ing  equation  is  used  to  adjust  the  exposure  period  of  the  results: 


01,oW=l-[l-a.D,Wl%‘  (2-34) 

where  Qx,d  is  the  exeeedanee  probability  (referred  to  as  the  Quantile  function  or  inverse 
CDF)  of  X  m  duration  D.  Ds  is  the  duration  of  the  simulation. 

2.  /.  4.  Method  of  Maximum  Likelihood 

The  extreme  value  distributions  considered  above  have  two  or  three  parameters. 
Fitting  the  distribution  to  the  collected  data  requires  finding  these  parameters.  The  idea  of 
maximum  likelihood  method  is  to  find  such  values  of  parameters  that  are  “more  likely”  to 
fit  the  data. 

What  is  “more  likely”?  The  data  points  that  have  been  observed  are  the  facts.  At 
the  same  time  they  are  instances  of  a  random  variable.  Just  beeause  they  were  observed, 
these  particular  values  are  more  likely  than  others.  That  means  that  the  probability  of 
observing  these  particular  values  reaehes  maximum  when  the  correct  parameters  are  used 
for  distribution. 

To  illustrate  application  of  this  principle,  consider  a  set  of  n  identically  distributed 
independent  random  variables.  .y„  /=  1,2,..,  n.  Assume  the  normal  distribution  for  the  first 
example: 


/(-v) 


yfln< 


-exp 


Tier 


U-n) 

2a2 


(2.35) 


Where  p  stands  for  the  mean  value  and  a  is  standard  deviation.  As  all  the  random 
variables  from  the  set  x„  i- 1,2...,  n  are  independent  of  their  joint  distribution  and  is  a 
produet  of  marginal  distributions  (2.35): 


/  (.Y|  ,-.,-Y„)  =  J~ [  r- — —  exp 
M  V27IC7 

It  is  not  difficult  to  see  that: 

/(.Y,,..,.Y„)  = 


2a2 
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(2.36) 
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exp 
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(2.37) 


The  joint  distribution  (2.37)  depends  on  two  parameters:  the  mean  value  p  and 
the  standard  deviation  a.  The  objective  is  to  find  such  estimates  for  p  and  a  that 
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maximizes  the  joint  distribution  j(x\,..x„).  As  ,vi,..Tv„  are  random  numbers  the  result  of 
maximization  also  is  a  random  number.  Therefore  the  estimate  is  actually  a  result  of 
averaging: 


(p\a')=E(p,a)=  E  argmax(/(.Y,,..,.Y„)) 


M*a 


(2  38) 


Here  £(..)  is  an  averaging  operator. 

The  instances  of  random  variable  are  particular  numbers,  while  the 

parameters  p  and  a  are  unknowns.  It  is  logical  to  consider  (2.37)as  a  function  of  the 
parameters: 


./ (M,cr)  ^ 
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(2.39) 


The  maximum  of  function  (2.39)  is  to  be  searched;  this  function  is  usually 
referred  to  as  the  maximum  likelihood  estimator: 


(p.a)  =  argmax(/(p,CT))  =  argmax(ZTp,a)) 


yi.n 


yi.a 


(2  40) 


Here  symbol  L  is  used  tor  the  maximum  likelihood  estimator. 

As  positions  of  maxima  cannot  be  affected  by  monotonic  transformations,  the  same 
values  will  be  obtained  from  the  logarithm  of  the  maximum  likelihood  estimator: 

(p.  a)  =  arg max(log(L(p, cr)))  =  arg max(L  *  (p,  a ))  (2  41 , 

ft, a  f 

The  basis  of  the  logarithm  to  chose  depends  on  the  particular  form  of  the  expression.  The 
natural  logarithm  seems  to  be  the  most  reasonable  choice  for  the  formula  (2.42): 
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(2  42) 
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Most  likely  estimates  for  the  mean  value  and  standard  deviation  now  can  be  found  as: 
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(2.43) 


<3L*(|i,cy) 


<?Z*(p,a) 


80 


=  0 

=  0 


The  first  equation  in  the  system  (2.43)  yields: 
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The  equation  (2.44)  is  linear  and  has  a  unique  solution: 


M—I-v, 

W  I 


Taking  the  average  from  both  sides  yields  the  searched  estimate: 


(2.44) 


(2.45) 


n  =  £(f»)  =  £ 


V"  i'=l 


1  V  1  V 

-ZY-  =  M 


i=i 


(2.46) 


Therefore  the  observed  average  represents  the  most  likely  estimate  for  the  mean  value. 
Consider  the  second  equation  of  (2.43): 
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further  consideration  of  (2.47)  yields: 


(2.47) 


V  1=1 


«  <>’=-!-£(*, -M): 

n  i=. 


(2.48) 


To  complete  the  solution  of  the  second  equation  of  (2.43),  the  solution  of  the  first 
equation  (2.45)  has  to  be  substituted  into  (2.48): 
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(2  49) 
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Standard  deviation  does  not  depend  on  a  mean  value.  Therefore  equation  (2.49)  remains 
valid  after  the  following  substitution: 


r  =  .v-p 
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(2  50) 


Expanding  the  square  of  a  sum  in  the  equation  (2.50): 
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(2  51) 


This  leads  to  the  following  expression: 
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Appling  averaging  to  the  formula  (2.52): 


a*'  =  E 
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(2.53) 


As  .vi,...v„  and  corresponding  V|,..y„  are  independent  random  variables,  the  second 
component  in  (2.53)  represents,  in  fact,  a  correlation  moment: 
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(2.54) 


Continuing  consideration  of  (2.53)  leads  to: 


69 


-i  t  n  —  i  ■> 

— wr  = - O' 


1  i  n- 1 


/; 


(2.55) 


Formula  (2.55)  confirms  the  well-known  fact  that  the  mean  value  of  the 
variance  is  biased.  The  estimate  with  corrected  bias  is: 


(2.56) 


Finally,  the  maximum  likelihood  method  leads  to  the  conclusion  that,  for  the 
normal  distribution  the  most  likely  fit  is  achieved  with  well-known  formulae  for 
estimates  of  mean  value  and  standard  deviation. 

2.2.  Using  Extreme  Value  Distribution  for  Evaluation  of  Upcrossing 
Rate 

2. 2.  1.  Genera I  Approach 

According  to  the  information  of  the  authors,  the  idea  to  use  extreme  value 
distribution  for  calculation  of  upcrossing  rate  belongs  to  G.  Hazen,  who  formulated  this 
idea  and  demonstrated  the  method  in  early  2009.  At  the  moment  of  writing  this  report, 
the  authors  are  not  aware  of  any  publications  of  this  method  by  G.  Hazen  or  anybody 
else. 

Consider  an  extreme  value  distribution  in  a  form  of  the  cumulative  distribution 
function  (CDF)  fitted  over  record  maxima,  provided  all  the  records  are  of  the  same 
duration  7s.  By  the  definition  of  CDF: 


FFV(a)  =  P(x  <  a) 


(2.57) 


The  probability  of  the  complement  event  is: 


P(x  >  a)  =  1  -  P(x  <a)  =  1  -  FEV (a) 


(2.58) 


This  probability  can  be  interpreted  as  a  probability  of  at  least  one  upcrossing  of 
the  level  a  by  a  process  „v(/)  during  time  of  record  7s-  Assuming  that  the  level  a  is  high 
enough  to  ensure  applicability  of  Poisson  flow,  this  probability  can  be  expressed  as: 


P(x  >  a)  =  1  -exp(-Xrs ) 


(2.59) 


The  formulae  (2.58)  and  (2.59)  allow  expressing  the  rate  of  upcrossing  as: 
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(2.60) 
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2.2.2.  Fitting  Extreme  Value  Distribution 

Following  the  work  of  G.  liazen  mentioned  above,  a  three- parameter  Weibull 
distribution  (2.32)  was  used  with  a  numerical  example  described  in  the  previous  section. 
This  example  included  200  reeords  of  wave  elevations;  each  record  was  30  minutes. 
More  details  can  be  found  in  the  subseetion  1.2.3. 

The  dataset  for  analysis  is  formed  from  the  maximum  value  observed  over  a 
record  or  a  window  of  a  smaller  size. 


(2.61) 


The  windowing  procedure  was  implemented  to  provide  the  ability  to  control  the  time  of 
exposure. 


(2.62) 


.v,  =  max 


The  shift  parameter  is  found  as  an  observed  minimum  of  the  dataset; 
0  =  min(.vi);  /  =  /V„ 


(2  63) 


Values  for  seale  and  shape  parameters  ean  be  found  using  the  known  relations  between 
the  theoretical  mean  value  mx»  theoretical  variance  Vx  ,  and  these  parameters. 
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Theoretical  values  of  the  mean  and  variance  are  unknown,  therefore  their  estimates  arc 
used  instead: 


v 


(2.65) 


A 


Then,  the  seale  parameter  a  and  shape  parameter  k  can  be  found  numerically 
from  the  system  of  equations  (2.64). 
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Alternatively  these  parameters  can  be  evaluated  using  the  method  of  maximum 
likelihood  as  described  in  detail  in  the  previous  subsection.  The  probability  density 
function  of  the  Weibull  distribution  is  expressed  as: 
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(2.66) 


Without  limitation  of  generality,  the  maximum  likelihood  method  can  be 
applied  for  shifted  random  variable y: 


y  =  .x-() 

The  PDF  of  the  shifted  variable  can  be  expressed  as: 


(2.67) 
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The  maximum  likelihood  estimator  L(a,  k)  can  be  expressed  as  (Cohen,  1965): 


L(a,A)  =  n/'0’,)  =  n-1- 
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(2.69) 


For  simplification  of  further  derivations,  it  is  convenient  to  use  the  substitution: 

9  =  a*  (2.70) 

Substitution  of  (2.70)  into  (2.69)  yields: 


(2.71) 


The  logarithmic  estimator  is  expressed  as: 
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(2.72) 


Maximization  of  the  estimator  (2.72)  requires: 
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(2.73) 


Differentiation  in  (2.71 )  leads  to  the  following  system  of  algebraic  equations: 


(2.74) 


The  unknown  0  can  be  expressed  through  the  unknown  A  using  the  first  equation  of  the 
system  (2.74). 


Substitution  of  (2.75)  into  the  second  equation  of  (2.74)  excludes  and  leads  to  a 
nonlinear  algebraic  equation  with  only  one  unknown.  A: 


=  0 


(2  76) 


Equation  (2.76)  then  can  be  solved  with  any  appropriate  numerical  method. 

Figure  2.1  shows  an  example  of  fitting  a  Weihull  distribution  using  both  the 
moment  method  ((2.64)-(2.65))  and  the  method  of  maximum  likelihood  ((2.75)-(2.76)). 
These  data  represent  200  maximum  wave  elevations  observed  during  each  record,  so  it 
was  only  one  window  per  record.  This  figure  also  shows  the  results  of  the  Pearson  chi- 
square  goodness-of-fit  test  for  both  methods.  As  it  can  be  seen,  both  methods  have 
passed. 
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0=7.364;  d=4 


Figure  2.1.  Histogram  of  F.xtreme  Wave  Elevation  with  W'eibull  distribution  fitted 

Table  4  shows  the  results  of  the  goodness-of-fit  test  for  a  series  of  windows 
length.  As  ean  be  seen  from  that  table,  once  a  window  beeomes  too  short  the  Weibull 
distribution  does  not  fit  the  data  anymore 


Table  4.  Results  of  Pearson  chi-square  goodness-of-fit  test 


,VM 
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n 

%11X 

d 

0 

Method  of  moments 

Methods  of  maximum 
likelihood 

k 

a 

X2 

P 

k 

a 

P 

1 

1800 

200 

1.01 

9 

7.364 

2.000 

2.173 

5.61 

0.468 

1.76 

2.12 

9.49 

0.148 

2 

900 

400 

1.1  1 

9 

6.248 

2.277 

2.704 

9.1318 

0.425 

2.13 

2.671 

11.3 

0.259 

3 

600 

600 

1.17 

12 

5.579 

2.407 

2.979 

18.20 

0.110 

2.304 

2.958 

18.4 

0.105 

4 

450 

800 

1.21 

14 

5.123 

2.481 

3.160 

38.72 

4.03e-4 

2.381 

3.141 

38.2 

4.03e-4 

5 

360 

1000 

1  24 

16 

4.452 

2.835 

3.637 

60.5 

4.32e-7 

2.703 

3.622 

54.3 

4.68  c-6 

2.2.3.  Evaluation  of  Confidence  Intervals  for  WeibnU  Distribution 

All  three  parameters  of  the  W'eibull  distribution  are  determined  from  statistical 
data  and,  therefore,  they  are  random  numbers.  This  means  that  the  uperossing  rate 
evaluated  with  en  extreme  value  distribution  (2.60)  is,  indeed,  a  random  number.  As  with 
any  other  estimate,  the  crossing  rate  (2.58)  needs  a  confidence  interval  to  evaluate  the 
statistical  uncertainty  involved. 

The  expression  (2.64)  relates  the  estimates  of  the  mean  value  and  variance  with 
parameters  of  W'eibull  distribution.  The  confidence  interval  for  these  estimates  can  be 
assessed  trivially  using  conventional  assumptions  of  the  normal  distribution  of  the 
estimates  of  the  mean  value  and  variance.  Caution  has  to  be  exercised,  however,  when 
applying  the  normal  distribution  for  the  variance  estimate,  as  the  variance  is  a  positive 
value,  while  the  normal  distribution  is  supported  for  negative  values  as  well. 

To  determine  the  distribution  for  the  mean  value  estimate,  two  parameters  are 
needed  -  the  mean  value  of  the  mean  value  estimate  and  the  variance  of  the  mean  value 
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estimate.  As  the  mean  value  is  an  unbiased  estimate,  its  mean  value  is  equal  to  the 
estimate  itself: 


m{m  )  -  m  =  V  x( :  V(m\)=  — ^  = - - - V  (.v  -  in  J 

tr  nw  (\-nh)n„  tr 


Then  the  half-breadth  of  confidence  interval  for  the  mean  estimate  is: 


£(»0  = 

The  eoeffieient  /fp  depends  on  eonfidenee  probability  0: 

p  =  0.95  ;  Kp  =  1 .959964 
P  =  0.9973:  Kp  =3.0 


(2  77) 


(2.78) 


(2  79) 


More  details  on  the  eonfidenee  interval  ean  be  found  in  the  previous  subsection 
(formulae  (2.54)-(2.58)). 

Finally,  the  complete  estimate  for  the  mean  looks  like: 
m\  (2  80) 


Construction  of  the  confidence  interval  for  the  variance  estimate  involves  more 
assumptions.  If  a  random  variable  would  have  a  normal  distribution,  then  the  distribution 
of  its  variance  estimate  would  follow  the  ehi-square  distribution.  One  of  the  most 
important  qualities  of  the  ehi-square  distribution  is  it  only  supports  positive  values.  This 
corresponds  to  one  of  the  most  basie  properties  of  the  variance  it  is  not  negative.  The 
PDF  of  the  ehi-square  distribution  is  defined  by  the  following  formula. 


/,  <*>  =  — 


2- . r 


x 

(d\yij 


v 


exp 


2 


.v  >  0 


(2  81) 


The  ehi-square  distribution  depends  on  a  single  parameter  d.  whieh  is  commonly  referred 
to  as  the  “degrees  of  freedom".  If  the  chi-square  distribution  is  used  for  construction  of 
the  eonfidenee  interval,  the  meaning  of  d  is  the  number  of  points  instances  of  the 
random  variable. 

With  an  increased  number  of  random  values,  the  chi-square  distributions  tends 
to  a  normal  distribution;  near  30  values,  it  is  almost  indistinguishable,  see  Figure  2.2.  The 
reason  for  the  convergence  of  the  two  distributions  is  the  Central  Limit  Theorem,  which 
states  that  the  sum  of  independent,  identically  distributed  random  variables,  tends  to  a 
normal  distribution  with  the  increase  of  number  of  components  in  the  sum.  That  is  w  hy 
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the  normal  distribution  is  also  used  for  construction  of  the  confidence  interval  for  the 
variance  where  the  number  of  points  is  large. 

At  the  same  time,  the  normal  distribution  is  supported  for  negative  values  as 
well,  which,  in  principle,  allows  a  certain  probability  for  negative  variances.  However, 
the  mean  value  of  a  random  variable  following  chi-square  distribution  equals  the  number 
of  degrees  of  freedom.  Thus  with  the  increase  in  the  number  of  points.the  entire  curve 
moves  to  the  right,  as  it  can  be  clearly  seen  in  Figure  2.2.  In  a  limit,  when  a  number  of 
points  reaches  infinity,  the  mean  value  is  equal  to  positive  infinity,  leaving  the  probability 
of  negative  variances  essentially  zero.  Nevertheless,  caution  has  to  be  exercised  while 
using  the  normal  distribution  for  the  confidence  interval  of  variances,  especially  when  the 
number  of  points  is  not  that  large  and  the  desired  confidence  level  is  relatively  high. 
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Figure  2.2  Chi-square  (red)  and  normal  distribution  (blue) 


The  main  advantage  of  the  normal  distribution  is  that  it  is  universal  for  a  large 
number  of  points,  while  the  ehi-squarc  distribution  only  can  be  applied  when  the  variable 
has  a  normal  distribution.  When  the  variable  does  not  have  normal  distribution,  like  in 
cases  with  extreme  values,  the  distribution  of  a  sum  of  its  squares  may  not  be  actually 
known.  However,  it  is  known  that  this  sum  will  tend  to  the  normal  distribution  with  the 
increase  of  the  number  of  points  due  to  the  Central  Limit  Theorem. 

To  define  the  normal  distribution  for  the  variance  estimate,  two  parameters  arc 
needed:  the  mean  value  of  the  variance  estimate  and  the  varianec  of  the  variance 
estimate.  As  it  is  known  (and  shown  in  the  previous  subsection)  that  the  variance  estimate 
is  biased:  its  mean  value  is  shifted  from  the  mean  sum  of  squares: 


// 


(2.82) 


The  variance  of  the  variance  is  expressed  with  the  well  known  formula: 
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n- 3 


(2.83) 


/;(/;-!) 


V 


Here  f.4  is  the  fourth  eentral  moment  of  the  distribution:  as  the  Weibull  distribution  has 
been  already  fitted,  the  PDF  is  known  so: 

M4  =  j/(-v)(  v  -  rnx Y  dx  (2  84) 

0 

The  fourth  central  moment  of  the  Weibull  distribution  can  also  be  direetly  calculated. 
The  exeess  kurtosis  of  the  Weibull  distribution  is  given  by: 

_  - 6 •  r,4  + 12 •  r,2  •  r,  -3 •  rv  - 4 •  r,  •  r,  +  r4 

(r;-r,T  ,285) 


Where:  T,  =  l  '(  l+i/k) 

The  fourth  eentral  moment  is  then: 

//4  =(3  +  y:)-<x4  (2  86) 

The  half-breadth  of  the  confidence  interval  for  the  mean  estimate  is  then: 

e(0  =  VVnO  (287) 

Finally  the  complete  estimate  for  the  variance  is 

v;=m(v;)±e(v;)  (2.88) 

The  third  parameter  is  the  shift.  It  is  defined  as  a  smallest  among  observed 
extreme  values.  Therefore  it  is  the  first  order  statistie  (see  previous  subsection)  and  has 
the  following  distribution: 


f(Q)  =  fiU(x)  =  NRf(x)(\-F(x))s*  1 


(2.89) 


Where  J{.x)  and  F(.v)  are  the  PDF  and  CDF,  respectively,  of  the  Weibull  distribution  and 
arc  defined  with  formulae  (2.34)  and  (2.68). 

There  is  a  problem  with  using  the  distribution  (2.89)  for  the  construction  of  the 
confidence  interval  direetly.  It  does  not  support  values  less  than  the  observed  shift  v<0 
Therefore,  all  possible  values  of  the  shift  are  larger  than  the  observed  one.  At  the  same 
time  there  is  no  reason  to  believe  that  the  observed  shift  is  the  smallest  possible. 
Therefore  it  is  assumed  that  the  observed  shift  is  some  sort  of  mean  value,  whieh  ean  be 
ealeulated  as: 
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(2.90) 


'»<„  =  (-v)-v^v 

0 

Then  the  distribution  (2.89)  shifted  by  the  value  ?w<i); 

/ (0)  =  N„f(Q  -m0))(\-  F(Q -  m(ll )) v*  1  (2.91) 

Both  the  original  and  shifted  distribution  are  shown  in  Figure  2.3.  Justification 
of  such  an  approach  could  be  offered  as  follows.  For  the  uni-modal,  slightly 
asymmetrical  distribution,  the  mean  value  can  be  considered  as  an  approximation  for  the 
mode.  Following  the  principle  of  maximum  likelihood,  the  observed  value  of  the  shift 
should  have  maximum  probability,  i.e.  should  correspond  to  the  mode. 


Figure  2.3  Distribution  of  the  first  order  statistic  and  distribution  of  shift 


Once  the  distribution  of  the  shift  parameter  has  been  accepted,  further 
calculations  of  the  confidence  interval  are  trivial  (see  details  in  the  previous  section, 
formulae(2.54)-(2.57)).  The  cumulative  distribution  function  of  the  shift  parameter  and 
its  derivative  are  expressed  as: 

o 

F(0)  =  {/(B)r/0;  Q(P)  =  bn{F(B))  (2.92) 

O'  m,„ 


Here  the  observed  value  of  the  shift  is  identified  w  ith  an  asterisk  to  avoid  confusion  with 
the  shift  parameter  as  a  variable.  The  boundaries  for  the  shift  parameter  are  expressed  as: 


H  -  Q 


I-P 


;  K>=Q 


+P 


(2.93) 


Here  p  is  the  accepted  confidence  probability. 


78 


To  complete  the  evaluation  of  the  lower  and  upper  boundaries  of  the  Weibull 
distribution,  the  variable  can  be  sealed  to  correspond  to  the  upper  and  lower  boundaries 
of  the  estimate  of  the  variance  and  shifted  to  accommodate  variability  in  the  mean  value 
estimate  and  the  shift  parameter. 

The  scaling  is  applied  directly  to  the  shifted  data  points  (see  formula  (2.67)): 


=  v. 
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(294) 


Scaling  of  the  variable  does  not  affect  the  shape  parameter.  It  can  be  verified  by  the 
solution  of  the  equation  (2.76)  using  data  points  scaled  to  the  lower  or  upper  boundary. 
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(2  95) 


(2  96) 


It  was  found  that  within  the  margin  of  numerical  tolerance: 


k  =  =  k 


(2  97) 


Therefore,  as  expected,  sealing  does  not  affect  the  shape  parameter.  The  boundary  for 
sealing  parameters  can  be  found  using  formulae  (2.75)  and  (2.70) 
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(2  98) 


The  lower  and  upper  boundaries  of  the  Weibull  distribution  itself  can  be  found  by 
applying  a  respective  shift  to  accommodate  variability  of  the  mean  value  and  shift 
parameter. 
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Figure  2.4  shows  Weibull  distribution  with  upper  and  lower  boundary  calculated  for. 

M,=  l. 


Fonnulae  similar  to  (2.99)  can  be  written  for  CDFs  as  well: 
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Figure  2.5  Shows  CDF  for  Weibull  distribution  with  the  confidence  interval 

The  procedure  described  above  has  not  formally  been  proven.  Rather,  it  has  to 
be  considered  as  an  approximate  method  allow  ing  the  evaluation  of  statistical  uncertainty 
of  uperossing  rate  calculated  on  the  basis  of  extreme  value  theory.  This  analysis  can  be 
found  in  the  next  subsection. 


Figure  2.4  PDF  for  Weibull  distribution  for  extreme  values  (blue),  its  upper  (brown)  and  lower  (red) 

boundaries 
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Figure  2.5  CDF  for  Weibull  distribution  for  extreme  xalues  (blue),  its  upper  (brown)  and  lower  (red) 

boundaries 


2.2.4.  Evaluation  of  Confidence  Intervals  for  Uperossing  Rates 

Formula  (2.60)  relates  the  CDF  of  an  extreme  value  distribution  to  the 
uperossing  rate.  The  procedure  described  in  the  above  section  derived  the  approximate 
CDF  based  on  the  Weibull  distribution  corresponding  to  lower  and  upper  boundaries  of 
confidence  interval  This  confidence  interval  reflects  the  uncertainty  introduced  by 
statistical  estimates  of  the  parameters.  These  boundaries  are  to  be  used  to  estimate  the 
eonfidenee  interval  for  the  rate  of  uperossings: 
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/dm 


In  (0">).  ,  lnta(«>) 

- ,  A  — - 


(2.101) 


This  confidence  interval  is  shown  in  Figure  2.6  along  with  theoretical  value  and 
estimates  obtained  with  other  methods  (described  in  the  previous  section)  The  level  of 
crossing  was  9  m.  The  method  based  on  the  extreme  value  distribution  provided  the 
correct  estimate;  the  theoretical  solution  is  inside  the  confidence  interval.  The  width  of 
the  confidence  interval  is  slightly  wider  than  the  result  based  on  the  censored  time  before 
the  first  crossing. 

When  the  crossing  level  is  raised  up  to  10  m  (see  Figure  2.7),  the  confidence 
interval  of  uperossing  rate  based  on  extreme  values  becomes  narrower  than  the 
confidence  interval  based  on  censored  time  before  the  first  uperossing.  Onee  the  crossing 
level  becomes  higher,  the  number  of  crossings  decrease  dramatically  and  the  statistical 
uncertainty  of  mean  time  before  the  1st  uperossing  also  increases.  Meanwhile,  the  volume 
of  samples  for  extreme  values  does  not  depend  on  how  high  the  crossing  level  is  or  it 
there  any  crossings  at  all.  The  dependence  of  the  width  of  the  confidence  interval  on  the 
crossing  level  will  be  examined  later. 

Further  increase  of  the  crossing  level,  up  to  11  in,  with  only  10  crossings  leads 
to  dramatic  widening  of  the  confidence  interval  for  the  uperossing  rate  based  on  time 
before  the  first  uperossing,  see  Figure  2.8.  The  rate  estimated  with  extreme  value 
distribution  retains  the  meaningful  width  of  the  confidence  interval  that  still  contains  the 
theoretical  value. 
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Figure  2.6  Comparison  of  different  methods  to  estimate  upcrocssing  rate  for  the  numerical  example 

for  level  9  m  (Total  number  of  upcrossings  153). 
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Figure  2.7  Comparison  of  different  methods  to  estimate  upcrocssing  rate  for  the  numerical  example 

for  level  10  m  (Total  number  of  uperossings  58  ). 


Lowering  the  level  of  crossing  to  7.75  m  leads  to  significant  widening  of  the 
confidence  interval  of  the  crossing  rate  based  on  the  extreme  value  distribution  (see 
Figure  2.9).  Attempts  to  lower  the  level  further  leads  to  the  impossibility  to  calculate  the 
confidence  interval  as  the  crossing  level  becomes  larger  than  the  lower  boundary  of  the 
shift  parameter.  To  alleviate  this  limitation  the  length  of  the  windows  needs  to  be 
shortened.  It  is  also  leads  to  a  narrower  confidence  interval,  see  Figure  2.9. 
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T  he  “big  picture*'  is  shown  in  Figure  2.10  and  Figure  2.1 1.  The  estimate  of  the 
upcrossing  rate  along  its  confidence  interval  evaluated  from  the  distribution  of  extreme 
values  is  shown  as  a  function  of  the  crossing  level.  The  theoretical  solution  is  also  shown 
in  these  figures.  Inserts  show  the  same  curve  for  higher  crossing  levels.  Figure  2.10 
shows  the  curves  foryVv,=  l,  so  the  window  length  equals  to  the  length  of  the  record,  while 
Figure  2. 1 1  contains  the  same  picture  for  Nw=2,  there  are  two  windows  per  record. 
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Figure  2.8  Comparison  of  different  methods  to  estimate  upcrocssing  rate  for  the  numerical  example 

for  level  11  ni  (Total  number  of  uperossings  10  ). 

The  curves  on  the  both  Figure  2.10  and  Figure  2.11  follow  the  same  pattern. 
The  theoretical  solution  is  contained  within  the  eonfidence  interval  until  it  reaches  a 
certain  crossing  level  value.  This  value  is  about  7.4  m  for  Nw=  1  and  6.2  m  for  /Vw  2.  This 
is  caused  by  the  limitation  of  Weibull  distribution  as  it  starts  from  the  shift  parameter 
which  is  the  smallest  among  observed  extreme  values.  Naturally,  the  probability  of 
exceeding  cannot  be  evaluated  below  this  level.  Using  a  shorter  window  makes  the  shift 
parameter  smaller  and,  therefore  allows  working  with  lower  crossing  levels. 

L.ooking  at  Figure  2.10  and  Figure  2.11  makes  it  clear  why  the  confidence 
interval  increased  so  much  in  Figure  2.9  for  Nw=  1 .  The  crossing  value  7.75  is  not  very  far 
from  the  limit;  the  curves  in  Figure  2. 10  turn  upward  and  become  almost  vertical. 

While  the  methods  based  on  the  extreme  method  do  not  work  very  well  for 
smaller  crossing  levels,  it  seems  to  give  excellent  results  for  the  higher  level  where  no 
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statistics  of  crossing  is  available.  In  the  example  considered,  there  is  only  one  crossing  on 
the  level  12  m.  The  maximum  value  observed  is  about  12.07  m,  so  there  is  not  an 
uperossing  for  the  level  of  13  m  and  higher.  As  it  can  be  seen  from  the  inserts  in  Figure 
2.10,  the  method  continues  to  produce  correct  results  well  beyond  that  level.  Therefore  it 
has  a  potential  for  statistical  extrapolation. 
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Figure  2.9  Comparison  of  different  methods  to  estimate  uperocssing  rate  for  the  numerical  example 
for  level  7.75  m  (Total  number  of  uperossings  425  ) 
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Figure  2.10  Estimate  of  uperossing  rate  based  on  extreme  value  distribution  with  confidence 
intervals  as  a  function  of  crossing  level  A«=l 

Making  windows  smaller,  however,  decreases  performance  of  the  method  for 
higher  levels.  As  it  can  he  seen  from  the  second  insert  in  Figure  2.1 1,  the  theoretical 
solution  leaves  the  confidence  interval  somewhere  around  15.4  m. 
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Figure  2.1 1  Estimate  of  upcrossing  rate  based  on  extreme  value  distribution  with  confidence 
intervals  as  a  function  of  crossing  level  Aw=2 


2.3. Summary 

Order  statistics  describe  the  behavior  of  the  A-th  largest  observation  out  of  total 
number  n.  Indeed,  it  is  a  random  variable  and,  as  any  other  random  variable,  can  be 
characterized  by  a  distribution. 

The  behavior  of  the  largest  observation  (case  when  k  =  n)  is  the  subject  of  the 
study  of  Extreme  Value  Theory  (EVT).  The  distribution  of  the  extreme  values  is  a  limit 
distribution  and  does  not  depend  on  the  distribution  of  the  random  variable  of  a  stochastic 
proeess.  If  the  extreme  value  distribution  is  applied  to  a  stochastic  process,  all 
observations  must  be  done  over  the  same  time  interval. 

The  parameters  of  an  extreme  value  distribution  can  be  determined  from  the 
observations  using  the  Method  of  Maximum  Likelihood  Estimation  (MLE).  The  MLE 
method  is  based  on  the  faet  that  the  obsereved  data  points  are,  actually,  random  numbers. 
As  these  particular  values  were  observed,  therefore  they  are  “more  likely",  or  their 
probability  of  oeeurring  is  maximum. 

The  extreme  value  distribution  parameters  are  random  numbers,  as  they  are 
calculated  from  random  variables.  Therefore  the  extreme  value  distribution  fitted  to  the 
observed  data  is  also  a  random  figure  and  subjeet  to  statstistieal  uncertainty.  The 
confidence  interval  is  evaluated  for  the  extreme  value  distribution  as  a  measure  of 
statistical  uncertainly. 

The  extreme  value  distribution  can  be  used  for  the  evaluation  of  the  upcrossing 
rate,  based  on  the  probability  of  no  upcrossing  events  oeeurring  during  the  observation 
time.  The  eonfidenee  interval  also  ean  be  evaluated  for  for  the  upcrossing  rate  calculated 
with  this  method. 
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3.  Peaks  over  the  Threshold 


This  section  describes  a  method  of  statistical  extrapolation  using  the  probabilistic 
properties  of  the  peaks  of  the  envelope  that  exceed  a  given  threshold. 

3.1. The  Problem  of  Rarity 

3. 1.  1.  Introduction 

Large  roll  events  are  rare.  The  main  objective  of  this  work  is  to  develop  a  method 
of  that  would  be  able  to  characterize  the  probability  of  events  that  are  too  rare  to  observe 
in  a  model  test  or  numerical  simulation.  This  problem  is  known  in  the  Naval 
Architecture  community  as  “The  Problem  of  Rarity”. 

The  problem  of  rarity  arises  when  the  average  time  before  a  stability  failure  may 
oceur  is  very  long  in  comparison  with  the  natural  roll  period,  which  serves  as  the  main 
time-seale  for  the  roll  motion  process  (definition  from  SLF  51/WP.2  Annex  I 
paragraph  6.3.2). 

While  the  problem  of  rarity  was  the  main  obstacle  for  application  of  time  domain 
methods  during  the  last  two  decades,  the  term  was  introduced  only  relatively  recently 
(SLF  50/4/4).  Some  review  of  available  treatments  of  this  problem  is  available  from 
Belenky,  et  al ,  2008. 

3. 1.2.  Statistical  Extrapolation  as  a  Solution  of  Problem  of  Rarity 

The  main  challenge  of  the  problem  of  rarity  comes  from  the  nonlinear  nature  of 
large-amplitude  roll  motions.  To  illustrate  this  statement,  one  can  imagine  that  roll 
motions  can  be  described  by  a  linear  differential  equation:  then  the  roll  response  could  be 
completely  characterized  by  a  response  amplitude  operator  within  the  frequency  domain. 
As  a  linear  operator  does  not  change  the  normality  of  the  distribution  and  the  wave 
excitation  can  be  considered  as  a  normal  process,  the  distribution  of  the  response  would 
be  known  to  be  normal.  In  this  case,  the  theory  of  uperossing  would  provide  the 
necessary  probabilistic  eharaeterization  of  crossing  any  level  and  the  problem  would  be 
fully  solved. 

Even  if  the  nonlinearity  would  be  mild,  application  of  a  linearization  procedure 
could  be  justified.  This  means  that  it  would  be  possible  to  find  such  a  linear  system  that 
would  describe  the  roll  motions  with  sufficient  aceuraey  within  a  relatively  wide  range  of 
variances  of  excitation. 

The  physical  reality  is  different,  however.  It  is  well  known  that  large-amplitude 
roll  motions  cannot,  in  general,  be  characterized  by  normal  distribution  (Belenky  & 
Sevastianov  2007).  The  type  of  distribution  depends  strongly  on  the  shape  of  the  ship's 
righting  arm  curve,  whieh  may  change  significantly  in  waves.  This  leaves  time  domain 
numerical  simulations  and  model  testing  as  the  only  available  options  to  characterize  the 
large-amplitude  roll  behavior  of  a  ship. 
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The  principle  of  separation  is  what  allows  the  problem  of  rarity  to  be  solved. 
Instead  of  one  problem  with  very  rare  events,  two  or  more  related  problems  are 
considered:  “non-rare”  and  “rare”.  The  “non-rare”  problem,  by  its  definition,  should  be 
solvable  by  a  conventional  numerical  simulation  or  model  test.  For  example  the  time- 
split  method  (Belenky,  et  al  2007)  considered  uperossing  of  a  level  around  the  maximum 
of  the  GZ  curve  as  the  “non-rare”  problem.  The  “rare”  problem  then  considered  the 
probability  of  capsizing  once  this  threshold  was  crossed.  The  “rare”  problem  was  solved 
by  a  series  of  short  simulations  trying  to  find  the  initial  conditions  at  uperossing  that  will 
lead  to  capsize.  For  a  single  DOF  roll  problem,  this  means  finding  the  critical  roll  rate, 
such  that  exceeding  this  critical  roll  rate  when  the  threshold  is  crossed  leads  to  capsize, 
see  Figure  3. 1 .  The  procedure  for  finding  the  critical  roll  rate  is  illustrated  in  the  insert  of 
Figure  3.1. 

The  solutions  for  the  non-rare  and  rare  problems  can  then  be  combined.  The 
combined  solution  gives  the  probability  of  crossing  the  threshold  with  initial  conditions 
that  would  lead  to  capsize  (roll  rate  exceeding  the  critical  roll  rate). 


Figure  3.1.  Summary  of  time-split  method:  separation  principle  and  critical  roll  ratel 

The  same  principle  is  used  here.  The  “non-rare”  problem  is  crossing  a  threshold 
that  is  low  enough  that  a  statistically  significant  number  of  crossings  can  be  observed  in  a 
model  test  or  numerical  simulation.  The  “rare”  problem  is  a  statistical  extrapolation  of 
the  data  above  this  threshold,  see  Figure  3.2. 


<)>  Uperossing  not  leading  to  To  be  characterized  with 

partial  stability  failure  statistical  extrapolation,  based  on 


Belenky  et  al  2008b 
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Figure  3.2  Summary  of  the  current  method:  separation  principle 


Nonlinearity  is  accounted  for  by  separating  the  small  and  large-amplitude  motions 
with  the  threshold.  If  any  sort  of  statistical  fit  is  used  on  roll  motion  data  in  its  entirety, 
the  resulting  fit  will  be  dominated  by  the  small-amplitude  motions  where  the  roll  motion 
is  still  relatively  linear,  and  the  influence  of  nonlinearity  will  generally  not  be  represented 
properly.  The  threshold  must  therefore  be  high  enough,  so  that  the  influence  of 
nonlinearity  above  that  threshold  can  be  considered  substantial.  It  eannot  be  chosen 
based  purely  on  statistics.  Physical  considerations  based  on  the  shape  of  the  GZ  curve 
must  be  included  as  well,  see  Figure  3.3.  These  considerations,  however,  are  outside  of 
the  scope  of  this  report,  so  therefore  the  threshold  is  assumed  given. 


figure  3.3  Nonlinearity  and  location  of  the  threshold 


3. 1.3.  Crossing  of  Two  Levels 

Prior  to  considering  the  statistical  extrapolation,  consider  the  relationship  between 
the  uperossing  rates  of  two  levels.  Consider  a  stationary  differentiable  process  x{t).  The 
objective  is  to  find  different  ways  to  express  the  uperossing  rate  of  level  co  through  the 
uperossing  rate  of  level  a i,  provided  that  a 2  >  a \. 

Generic  formulae  for  uperossing  rates  are  as  follows: 

x  v. 

£1  =  /(« 1 )  jV \x)dx  £,  =  f(a2 )  J.v/(.v)t/v  (3. 1 ) 

0  0 

The  formulae  (3.1)  immediately  yield  one  way: 
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P  =  7r4 

y(«i) 


(3.2) 


If  the  first  level  is  crossed,  the  value  P  can  be  interpreted  as  a  conditional 
probability  to  cross  the  second  level. 

Consider  an  envelope  defined  as: 


zl(/)  =  V.v(/)2+;-(/)3  (3.3) 

Here  y(/)  is  a  complimentary  process  obtained  as  a  the  result  of  a  Hilbert  transform. 

If  an  uperossing  of  a  level  has  occurred,  a  value  of  the  envelope  exceeding  that 
level  exists  in  the  local  vicinity.  The  local  vicinity  is  defined  as  an  interval  of  time  while 
the  process  x (t)  remains  above  the  level  az.  The  local  maximum  (defined  as  a  maximum 
of  the  envelope  located  in  the  local  vicinity)  of  the  envelope  limits  the  local  maxima  of 
the  process.  Therefore  whether  the  process  will  cross  the  level  uz  depends  if  the  local 
maximum  of  the  envelope  exceeds  this  level  or  not. 

The  probability  of  the  envelope  exceeding  the  level  az  can  be  expressed  as: 


x 

P(A  >  a 2)~  \fMW 


(3.4) 


The  conditional  probability  that  the  envelope  exceeds  level  az  under  the  condition 
that  level  a\  was  previously  exceeded: 

*.• 

\j f.MW 

P(A>  a,\A>  at)  =  -r -  (3.5) 

\fM)dA 

a 

Therefore  another  way  to  express  the  second  uperossing  rate  is: 


j '/AA)dA 

-  (3.6) 

f  UA)dA 


Equivalency  of  formula  (3.6)  can  be  formally  proven  for  a  normal  process,  as  the 
distribution  of  the  envelope  is  known  to  follow  Rayleigh 
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.V2  ] 

f*(A)  =  -p~cxp 
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A2  } 

S  2VxJ 
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Substitution  of  (3.7)  into  (3.6)  yields: 


(3.7) 
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a. 


2V 


7  \  Si 


exp 


exp 


(  \ 

cu 


2  Vt, 


51  /  2  X  ^ 


2*',^ 


exp 


a. 
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(3.8) 


For  any  type  of  distribution  of  process  *(/),  comparison  of  formulae  (3.2)  and 
(3.6)  yields: 


p  —  /(“:) 

/(«,) 


[f\(A)dA 


( 


(3.9) 


Having  in  mind  that  the  envelope  contains  all  the  peaks,  the  formula  (3.9)  also  can 
be  interpreted  as  the  relationship  between  an  uperossing  and  a  peak.  As  it  is  shown  in 
Figure  3.4,  if  a  peak  is  above  the  threshold,  uperossing  did  occur  as  the  stochastic  process 
is  continuous.  There  is,  therefore,  a  one-to-one  correspondence  of  the  occurrence  of  an 
up-crossing  of  a  level  and  the  occurrence  of  a  peak  value  over  that  level  The 
consequence  of  this  is  a  practical  one.  The  peaks  may  simply  be  used  as  a  surrogate  for 
the  occurrence  of  uperossings. 


Figure  3.4  Relations  between  peak  and  uperossing 
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3.2. Properties  of  Peaks 

As  the  peaks  over  the  threshold,  in  principle,  can  be  used  as  surrogate  for 
uperossing,  it  makes  sense  to  study  the  characteristics  of  peaks  in  detail. 

3.2.1.  Distribution  of  Peaks 

The  total  number  of  positive  peaks  found  in  the  wave  elevation  sample  dataset 
was  3 1 ,065.  Positive  peaks  were  defined  from  the  following  conditions: 

•'max  =  '('max  )  I  'max  :  (•'('max  )  =  0)  ^  (''('max  )  <  0)  (3-10) 

Figure  3.5  shows  a  histogram  of  positive  peaks  superimposed  with  a  Rayleigh 
distribution.  A  notable  feature  of  the  positive  peak  sample  shown  in  the  histogram  is  the 
presence  of  negative  values.  They  correspond  to  secondary  peaks,  which  could  be 
expected  as  the  spectrum  is  not  narrow. 

Obviously  a  Rayleigh  distribution  is  not  suitable  here.  It  known  that  peaks  of  a 
normal  process,  characterized  by  a  moderate  bandwith  spectrum,  have  a  Rice  distribution 
that  tends  to  a  Rayleigh  distribution  with  a  decrease  of  the  bandwith  and  to  the  normal 
distribution  with  an  increase  of  bandw  ith. 


Figure  3.5  Histogram  of  positive  peaks  and  Rayleigh  distribution 

Part  of  the  distribution  of  the  positive  peaks,  nevertheless,  can  be  described  by  a 
truncated  Rayleigh  distribution,  defined  as  follows: 


a 


( 


faSa)  =  kn(a,)—ex  p 


a' 


1  \ 


2V 


A  J 


(3.11) 


Here  k„  is  a  normalization  coefficient  depending  on  the  truncation  value  a ,,  Vx  is 
the  variance  of  the  process  x.  The  normalization  coefficient  can  be  evaluated  in  closed 
form  as: 


92 


X 

f  ~  \ 

i 

(  2  \ 

r a 

a 

da 

at 

Jf6XP 

{  2K,J 

=  exp 

2V 

\  A  J 

V“.  > 

) 

However  it  is  better  to  evaluate  it  in  diseretized  form  using  the  width  of  the 
bucket  Ax: 


M°»)  = 


\b 


If  MW 

V 


0=0,  , 

l  beg 


(3.13) 


Where  beg  is  the  index  corresponding  to  the  value  of  truncation  a,.  The  histogram  also 
needs  to  be  re-normalized: 


f  Sb  ' 

M",)=  Z/;AV 


a,  = 


(3.14) 


The  truncated  Rayleigh  distribution  and  truneated  histogram  are  shown  in  Figure 
3.6.  The  value  of  truncation  was  ehosen  to  pass  the  Pearson  chi-square  goodness-of-fit 
test;  the  results  of  which  are  also  shown  in  this  figure. 


o  ? 
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p.d.f 


A* 
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Pearson  chi-square  goodness-of-fil  lesi 
Number  of  buckets  46 
Slart  bucket  23,  value  1 .5  i 
X2=55.54  d=45  0.135 
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Figure  3.6  Histogram  of  positive  peaks  and  truncated  Rayleigh  distribution 


As  it  can  be  seen,  the  results  of  Pearson  chi-square  goodness-of-fit  test  shown  in 
Figure  3.6  does  not  rejeet  Rayleigh  distribution  for  peaks  starting  from  the  bin  23  whieh 
corresponds  to  1.51  m.  This  means  that  above  the  truncation  value,  secondary  peaks  arc 
not  statistically  significant,  so  most  of  the  sample  population  consists  of  primary  peaks 
that  belong  to  the  envelope.  The  latter  circumstance  leads  to  applicability  of  Rayleigh 
distribution. 

A  similar  picture  can  be  observed  for  the  sample  of  wave  elevation  recorded  by  a 
wave  probe  moving  with  constant  speed  and  direction  (see  subseetion  4.2).  For 
following  seas,  the  eneounter  spectrum  becomes  very  narrow  banded  (see  Figure  4.10  in 
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subsection  4.4.1).  The  entire  histogram  is  shown  in  Figure  3.7.  Visually,  it  looks  much 
more  like  Rayleigh  distribution  with  a  very  small  negative  area.  However,  it  takes  setting 
the  truncation  starting  at  bucket  5  with  the  value  of  0.54  in  to  get  the  Pearson  chi-square 
goodness  of  fit  test  to  pass,  see  Figure  3.8. 


Figure  3.7  Histogram  of  positive  peaks  for  the  case  with  forward  speed  15  knots  and  Ravleigh 

distribution 


Figure  3.8  Histogram  of  positive  peaks  for  the  case  with  forward  speed  15  knots  and  truncated 

Ravleigh  distribution 


3.2.2.  Estimate  of  Rate  of  Events  for  Positive  Peaks 

Following  the  concept  of  the  upcrossing  rate,  it  is  possible  to  come  to  similar 
formulations  for  positive  peaks  using  statistical  extrapolation. 

Consider  a  sample  of  stochastic  process  x,  presented  in  the  form  of  an  ensemble 
of  Nr  records.  Each  record  is  represented  by  a  time  history  of  Npp  points  with  the  time 
step  At,  totaling  n~  Npj-\  time  steps.  Then  the  event  of  occurrence  of  a  peak  exceeding  a 
given  threshold,  or  the  level  a,  can  be  associated  with  an  auxiliary  random  variable,  W, 
defined  for  each  time  step  as  follows  (see  Figure  3.9): 
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(3.15) 
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1  X,l>anX,l  >Xi  \.jnXi.J  >XM.j 

0  Otherwize 


'  =  U«;  7=1 . 


Figure  3.9  Auxiliary  random  variable  for  positive  peaks  over  the  threshold 


This  random  variable  W  is  defined  analogously  to  auxiliary  random  variable  U . 
see  the  subseetion  1.2.1.  Following  the  same  logic,  the  total  number  of  all  crossings  is 
just  the  sum  of  the  values  of  the  auxiliary  variable  for  all  time  steps  for  all  records: 


XI1'! 


(=1  /=! 


(3.16) 


An  estimate  of  the  probability  that  a  peak  exceeding  the  threshold  will  occur  at 
any  given  instance  of  time  is  given  by: 


P/M  = 


Nu 


»  'R 

•XIX, 


uNr  nNR  , ■  ,  ,  , 


(3.17) 


The  mean  number  of  peaks  over  the  threshold  per  record: 
Nu 


m 


'  s=JisL=-Lyf'ff' 

"  n,  N'hW'-' 


(3.  IS) 


The  rate  of  events  for  the  positive  peaks  over  the  threshold  can  be  introduced 
analogously  to  the  rate  of  the  upcrossing.  Its  estimate  over  a  finite  volume  of  data  is 
defined  as: 
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While  the  theoretical  definition  can  be  obtained  as  a  result  of  a  limit  of  transition 
for  an  infinite  number  of  records  and  infinitely  small  time  step: 
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(3.20) 


The  confidence  interval  for  the  estimate  of  the  rate  can  be  computed  using  a 
binomial  distribution  for  the  auxiliary  variable  W,  exactly  in  the  same  way  as  it  was  done 
for  uperossings.  The  results  for  the  9  m  crossing  level  shown  in  Figure  3.10  (for  zero- 
speed  case) 


Figure  3.10  liperossing  rate  and  rate  of  positiv  e  peaks  over  the  threshold.  Crossing  level  9  m 


Similar  results  for  a  range  of  levels  are  shown  in  Figure  3.1 1,  while  numerical 
values  can  be  seen  in  Table  5.  For  all  of  the  cases,  with  the  exception  of  one  («=10  m), 
the  confidence  interval  of  the  estimate  of  the  rate  of  events  for  positive  peaks  over  the 
threshold  contains  the  theoretical  uperossing  rate.  The  exception  is  likely  to  be  caused  by 
random  reasons,  as  it  is  local.  In  fact  it  can  be  caused  by  a  peak  in  the  beginning  of  the 
record,  so  it  was  counted  as  part  of  sample  of  peaks,  but  there  were  no  uperossing 
corresponding  to  that  peak,  as  the  process  probably  crossed  the  level  before  time  zero  (for 
10  m  crossing  level  there  were  60  peaks  and  only  58  uperossings,  so  two  peaks  were  in 
the  beginning  of  the  records). 
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Figure  3.)  1  Theoretical  lipcrossing  rate  (red  line)  \s.  rate  of  events  of  positive  peaks  over  the 

threshold 


3.2.3.  Poisson  Flow  and  Positive  Peaks 

The  equivalence  between  an  upcrossing  and  a  positive  peak  over  the  threshold  can 
be  further  illustrated  by  demonstrating  that  positive  peaks  over  the  threshold  follow  a 
Poisson  flow.  A  direct  test  of  a  Poisson  flow  was  applied  as  described  in  the  subsection 
1.3.5  on  page  on  page  49. 

Eaeh  record  was  divided  by  /V„  windows  of  length  T to  keep  the  maximum 
number  of  event  below  7.  Then,  the  sample  was  created  by  counting  the  number  of 
positive  peaks  over  the  given  threshold  in  eaeh  window.  A  Pearson  ehi-square  goodness- 
of-fit  test  was  used  to  eheek  the  applicability  of  the  Poisson  distribution  based  on  the 
statistically  estimated  rate  of  events  for  positive  peaks  over  the  threshold.  The  results  are 
shown  in  Table  5.  The  mean  value  and  variance  were  estimated,  and  their  ratio  is  also 
ineludcd  in  Table  5  as  an  indicator  of  applicability  of  Poisson  distribution.  Figure  3.12 
shows  details  for  the  crossing  level/  threshold  of  9  m. 

The  results  from  Table  5  show  that  the  Poisson  distribution  is  not  rejected  until 
the  threshold  is  lowered  to  between  5.5  and  5.25  m.  whieh  is  the  same  as  was  for 
uperossings,  see  section  1.  This  is  one  more  indication  of  statistical  equivalence  of  these 
random  events.  Finally,  Figure  3.13  shows  the  histogram  for  1  m  level  of  crossing.  Of 
course, the  Poisson  distribution  is  rejected,  as  most  of  the  data  is  clustered  around  3  and  4 
events  per  46  second  windows,  which  corresponds  to  a  variation  around  a  mean  period  of 
the  process  of  12.6  s.  A  visual  indication  of  the  inapplicability  of  the  Poisson  distribution 
is  given  by  the  more  sharp  form  of  the  histogram,  eaused  by  the  concentration  of  data  in 
buekets  near  the  mean  period. 
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Crossing  level  9  m 

Total  154  positive  peaks  over  the  threshold 
Number  of  time  windows  per  record  V„  1 
Duration  of  time  w  indow  7*=1800  s 
Volume  of  sample  200 
Estimate  of  mean  value  mk  =0.77 
Estimate  of  variance  I'/  VxPP  =  0  ?00<j 
Ratio  ntk  /  Vk  “1.099 
X2=2.49  d=3  P(x',d)=  0.4769 


Figure  3.1 2  Probability  mass  function  of  number  of  po$iti\e  peaks  o\cr  the  threshold  during  time 

window.  Poisson  distribution  is  not  rejected 


Table  5.  Applicability  of  Poisson  flow  and  rate  of  events  of  positi\e  peaks  over  the  threshold 
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5 
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6 
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7 
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8 
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0.00205 
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4 
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7 

5 

1.746 
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7 
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0.3326 

0.005044 
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Figure  3.13  Probability  mass  function  of  number  of  positive  peaks  o\er  the  threshold  during  time 

window.  Poisson  distribution  is  rejected 


3.3. The  Rare  Problem 

Solving  the  rare  problem  means  finding  the  probability  that  the  process  will 
exeeed  a  given  level  after  an  uperossing  of  the  threshold  has  oeeurred.  The  only 
information  available  is  the  data  of  the  proeess  beyond  the  threshold;  these  data  may  not 
go  as  far  as  the  level  of  interest,  so  it  is  a  typieal  extrapolation  problem. 

It  was  shown  above  that  if  the  distribution  of  peaks  is  known,  then  the  rate  of 
uperossings  through  the  given  level  ean  be  found  using  formula  (3.6).  In  faet,  if  the  data 
of  peaks  over  the  threshold  is  used  to  fit  the  distribution,  this  distribution  already  is 
conditional,  therefore: 

S2=S,  \fM\A>  a,  )dA  =  ^{\-F(A\A>  a))  (3.21) 


Therefore  the  objective  of  the  rare  problem  is  finding  the  conditional  CDF  of 
positive  peaks  above  the  threshold. 

3.3.  /.  POT  Distribution  and  the  Confidence  Interval 

For  a  general  stoehastie  proeess,  the  distribution  of  amplitudes  and  conditional 
distribution  of  peaks  above  the  threshold  are  not  known.  As  the  process  of  interest  is  a 
response  of  a  highly  nonlinear  dynamieal  system,  there  is  a  very  small  ehanee  that  such  a 
distribution  ean  be  obtained  in  elosed  form.  Therefore  it  needs  to  be  fitted  with  some 
approximate  formula  using  available  data. 

On  the  other  hand,  such  distribution  is  known  to  be  Rayleigh  for  a  normal 
proeess.  As  the  proeess  of  interest  is  a  response  of  a  nonlinear  dynamieal  system  on 
normal  exeitation,  it  may  be  meaningful  to  try  a  Weibull  distribution  as  an 
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approximation,  keeping  in  mind  that  the  Rayleigh  distribution  is  formally  a  particular 
case  of  the  Wcibull  distribution. 

Using  a  Weibull  distribution  in  such  a  context,  however,  should  not  imply  any 
limiting  characteristics  of  the  distribution.  In  principle,  Weibull  distribution  is  an  extreme 
value  distribution  (see  section  2).  Extreme  value  distribution  is  a  limit  distribution  to 
which  the  maximum  value  observed  during  a  given  time  tends.  However,  using  just  peaks 
over  the  threshold,  to  fit  the  Weibull,  means  that  it  is  used  only  as  an  approximation 
formula  that  possesses  some  convenient  characteristics,  like  normalization. 

Figure  3.14  shows  a  histogram  of  peaks  over  the  threshold  (for  9  m)  fitted  with  a 
Wcibull  distribution  along  with  the  results  of  the  goodness-of-fit  for  the  Weibull 
distribution  fitted  with  the  moment  method  and  the  method  of  maximum  likelihood.  Both 
ways  of  fitting  the  Weibull  distribution  were  not  rejected  as  well  as  the  truncated 
Rayleigh  distribution.  A  similar  picture  can  be  seen  from  the  Figure  3. 1 5 


Figure  3.14  Fitting  the  distribution  for  positive  peaks  over  the  threshold  9  m,  154  peaks  total 


Figure  3.15  Fitting  the  distribution  for  positive  peaks  over  the  threshold  7  m,  1 155  peaks  total 
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The  procedure  for  calculating  the  confidence  interval  for  a  fitted  Weibull 
distribution  has  been  described  in  Section  2.  The  only  difference  here  is  that  the  shift 
parameter  is  known  and  it  is  equal  to  the  threshold.  Figure  3.16  shows  the  confidence 
interval  for  the  Weibull  distribution  fitted  with  the  moment  method  for  a  9  m  threshold, 
with  both  PDf  and  CDF  are  shown.  As  it  has  been  demonstrated  earlier,  the  true 
distribution  of  this  case  is  Rayleigh,  and  it  is  also  shown  in  the  CDF  plot.  It  can  be  clearly 
seen  that  Rayleigh  differs  from  thw  fitted  distribution  enough,  so  that  part  of  the  curve  is 
actually  outside  of  the  confidence  band.  The  difference,  however,  is  not  very  large  and 
the  theoretical  curve  returns  back  to  the  confidence  band  around  a  peak  value  of  10  m. 

The  results  are  better  if  the  Weibull  distribution  is  fitted  with  the  method  of 
maximum  likelihood,  see  Figure  3.17.  Here  the  theoretical  Rayleigh  distribution  remains 
within  the  confidence  band  all  the  time.  The  evident  outcome  is  that  the  method  of 
maximum  likelihood  provides  better  results;  this  is  consistent  with  existing  statistical 
practice,  where  the  moment  method  is  only  used  to  get  initial  values  for  the  method  of 
maximum  likelihood. 

Lowering  the  threshold  naturally  leads  to  increase  the  of  number  of  peaks  over  the 
threshold  and,  as  a  result,  a  more  accurate  fit.  As  seen  from  the  Figure  3.18,  the 
confidence  band  has  shrunken  and  it  contains  a  theoretical  Rayleigh  distribution  w  ithin. 

This  consideration  shows  that  the  choice  of  the  threshold  is  not  only  dictated  by 
the  applicability  of  the  Poisson  distribution,  but  also  by  statistical  accuracy  of  fitting  the 
distribution.  This  question,  however,  needs  to  be  addressed  later  when  the  entire  result 
will  be  obtained  and  compared  with  the  “true”  theoretical  answer. 


Figure  3.16  Confidence  interval  oil  Weibull  distribution  fitted  with  moments  method.  The  threshold 

9  m,  154  peaks  total 
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Figure  3.17  Confidence  interval  on  Weibull  distribution  fitted  with  maximum  likelihood  method.  The 

threshold  9  ni,  154  peaks  total 


Figure  3.18  Confidence  interval  on  Weibull  distribution  fitted  with  method  of  moments  (a)  maximum 
likelihood  method  (b).  The  threshold  7  ni,  1155  peaks  total 
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3.3.2.  Statistical  Extrapolation  of  Peaks  Over  the  Threshold 

Formula  (3.21)  gives  the  Final  expression  for  the  extrapolated  estimate  of 
uperossing  rate  for  the  level  interest  (the  second  level  «:): 

Kx  =l'(\-F'(a2))  (3  22) 

Here  is  an  estimate  of  the  uperossing  rate  extrapolated  to  the  level  of  interest 

c/2,  X*  is  the  statistical  estimate  (by  counting)  of  the  uperossing  rate  of  the  gi\cn  threshold 
a\.  F*(az)  is  the  CDF  Fitted  to  peaks  over  the  given  threshold  a |. 

As  all  the  terms  in  the  equation  (3.22)  have  their  boundaries  of  confidence 
interval  defined,  it  is  possible  to  express  these  boundaries  for  the  Final  result: 


Slow 

(3  23) 

'w  =K,r(l-  F, i)) 

(3  24) 

To  evaluate  the  quality  of  the  extrapolation,  the  extrapolated  value  can  be 
compared  with  the  theoretical  solution,  as  it  is  available  for  consideration,  see  Figure 
3.19. 


Here  F,  is  the  variance  of  the  proeess  and  Vx  is  the  variance  of  the  derivative  of  the 
process  x(t). 

Figure  3.19  shows  excellent  quality  of  extrapolation,  as  the  theoretical  solution 
remains  within  the  confidence  interval  before  the  numbers  become  too  small  to  handle. 
Unfortunately,  it  is  not  always  the  ease.  Lowering  the  threshold  may  decrease  the  quality 
of  extrapolation,  see  Figure  3.20.  The  theoretical  solution  leaves  the  confidence  interval 
on  the  level  11.18  m.  It  may  not  look  like  a  good  quality  of  extrapolation;  however  it  still 
was  able  to  prediet  amplitudes  more  than  4  m  above  the  threshold  of  7  m,  which  is  above 
50%  of  the  threshold. 

Therefore,  the  choiee  of  the  threshold  is  important.  Figure  3.21  shows  a 
“breakpoint”  of  the  method  as  a  function  of  the  threshold.  The  “breakpoint”  is  the  level 
below  which  the  extrapolation  is  still  good.  As  it  ean  be  seen  from  Figure  3.21,  the 
dependence  is  not  monotonic.  It  remains  almost  horizontal  until  a  threshold  of  <8  m.  then 
starts  to  increase  and  reaches  its  maximum  of  26  m  somewhere  around  9  m.  This  is  the 
situation  shown  in  Figure  3.19,  where  the  theoretical  value  is  contained  in  the  confidence 
interval  of  the  extrapolated  value.  The  level  of  26  m  is  where  the  calculations  were 
stopped  as  arithmetical  difficulties,  related  with  handling  a  very  small  number,  were 
encountered.  Also,  rarity  of  the  uperossing  level  of  26  m  is  such  that  it  makes 
consideration  impractical  as  the  mean  value  before/  between  the  events  is  around  1 0:<l 
seeonds  -  is  about  3.1  1 0 1 2  years. 
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Figure  3.19  Extrapolated  estimate  of  upcrossing  rate  with  confidence  interval  as  a  function  crossing 
le\el  \s.  theoretical  upcrossing  rate.  The  threshold  9  m,  154  peaks  total. 


Figure  3.20  Extrapolated  estimate  of  upcrossing  rate  with  confidence  interval  as  a  function  crossing 
level  vs*  theoretical  upcrossing  rate.  The  threshold  7  m,  1155  peaks  total. 
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Around  the  threshold  of  9.25  m,  the  breakpoint  falls  baek  to  the  13-14  m  level 
(see  Figure  3.23)  and  remains  there  until  the  9.75  m  threshhold.  Then  it  goes  back  to  “no 
breakpoint”  (26  m)  and  remains  there  (Figure  3.24)  until  the  data  for  fitting  the 
distribution  “runs  out”.  The  volume  of  data  available  for  fitting  the  distribution  is  placed 
in  Table  6. 


Figure  3.21  Breakpoint  lex  el  (the  lex  el  below  which  the  extrapolation  is  still  good)  vs  threshold 

Figure  3.22  illustrates  the  performance  of  the  method  for  the  14  m  level.  This 
level  is  actually  quite  high;  the  mean  time  before/between  events  for  uperossing  this  level 
is  about  44  days.  Roughly,  aeeeptable  performance  (with  very  few  dropouts)  can  be  seen 
from  the  level  exceeding  8.5  m 


Figure  3.22  Comparison  of  performance  of  extrapolation  method  for  14  m  using  different  x  allies  for 

the  threshold 


Table  6.  Number  ofpositixe  peaks  over  threshold 
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Figure  3,23  Fvtrapolated  estimate  of  uperossing  rate  with  confidence  interval  as  a  function  crossing 
level  vs.  theoretical  uperossing  rate.  The  threshold  9.5  m,  100  peaks 
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Figure  3.24  Extrapolated  estimate  of  uperossing  rate  with  confidence  interval  as  a  function  crossing 
level  v  s.  theoretical  uperossing  rate.  The  threshold  10  m,  29  peaks 
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Based  on  these  observations,  two  questions  need  to  be  answered:  I)  What  is  the 
main  contributor  to  the  quality  of  extrapolation?  2)  How  random  are  these  results,  or  can 
the  same  pieture  be  observed  with  other  rceords  of  the  same  proeess? 

To  answer  the  first  question,  consider  the  components  of  the  extrapolated 
estimate:  the  rate  of  uperossing  through  the  threshold  (see  Figure  3.25)  and  the 
conditional  probability  of  the  given  level  (14  m)  is  exceeded  if  the  threshold  was  crossed 
(see  Figure  3.26).  As  it  can  be  seen  from  Figure  3.25,  the  estimate  of  threshold  crossing 
behaves  relatively  smoothly,  keeping  the  theoretical  solution  within  its  confidence 
interval.  The  estimate  of  the  conditional  probability  in  Figure  3.26  behaves  in  a  more 
random  manner.  It  “eatehes"  the  theoretical  solution  with  its  confidence  interval  starting 
about  8.5  m,  then  “loses”  it,  than  “catehes”  it  again.  Obviously,  the  estimate  of 
conditional  probability  is  the  one  “responsible"  of  quality  extrapolation,  at  least  for  the 
sample  considered. 

To  answer  the  seeond  question,  two  more  examples  were  considered.  The  process 
was  constructed  from  exactly  the  same  spectrum  and  discretization  as  described  in 
Seetion  1 .  The  only  difference  was  the  set  of  random  phases  that  makes  these  sets 
independent  from  the  first  one. 

Figure  3.27  shows  the  “breakpoints”  behavior  vs.  the  threshold  for  two  alternative 
datasets.  Comparing  these  plots  with  the  similar  one  in  Figure  3.26  made  from  the 
original  dataset  one  can  see  that  despite  their  different  shapes,  still  there  are  some 
common  features.  There  is  a  relative  monotonic  part  up  to  about  8  ni,  then  the 
dependence  becomes  oscillatory,  but  keeping  the  similar  tendency  to  grow.  Comparing 
performance  of  the  extrapolation  method  for  a  level  of  14  m  (Figure  3.28)  for  the 
alternative  datasets  one  can  see  that  for  the  alternative  dataset  1 ,  the  extrapolation  starts 
working  at  9.5  m  and  for  the  alternative  dataset  2  at  8.75  m.  The  original  dataset  gave  the 
value  around  8.5  m  (Figure  3.22),  so,  in  general,  the  range  is  about  8. 5-9. 5  m. 


Figure  3.25  Statistical  estimate  of  uperossing  rate  through  the  threshold 


107 


CdQ 

O 


CJD 

C 


o 

Cl 


O 

<L> 

"c5 

CdO 

O 


Figure  3.26  Extrapolated  estimate  of  conditional  probability  that  the  process  will  exceed  the  le\el  of 

14  m  if  the  threshold  has  been  crossed 


a)  Alternative  set  1  b)  Alternative  set  2 


Figure  3.27  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  vs  threshold 


a)  Alternative  set  l  b)  Alternative  set  2 


Figure  3.28  Comparison  of  performance  of  extrapolation  method  for  14  m  using  different  values  for 

the  threshold 


Figure  3.29  shows  behavior  of  two  components  of  the  POT  extrapolation  method 
for  two  alternative  datasets  for  14  m  level.  The  upper  plots  show  the  statistical  estimate 
of  rate  of  uperossing  of  the  threshold.  The  plots  are  very  similar  to  each  other  and  to  the 
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plot  based  on  the  original  dataset  in  Figure  3.25.  The  estimate  is  almost  indistinguishable 
with  the  theoretieal  solution  for  the  lower  thresholds.  With  an  increase  of  the  threshold, 
the  estimate  slowly  oscillates  around  the  theoretieal  solution,  while  it  remains  within  the 
confidence  interval  The  latter  one  increases  monotonically  with  the  rise  of  the  threshold 
and  decrease  of  the  number  of  observed  uperossings. 

The  behavior  of  the  estimate  of  the  conditional  probability  (lower  plots  of  Figure 
3.29)  is  characterized  by  more  oscillations.  The  estimates  on  both  plots,  as  well  as  for  the 
original  dataset,  in  Figure  3.26,  have  a  relatively  monotonic  range  for  the  thresholds,  but 
the  theoretical  solution  is  not  “caught”  by  the  confidence  interval.  With  the  increase  of 
the  threshold,  the  behavior  becomes  oscillatory,  the  confidence  interval  increases  and 
theoretical  solution  is  included  now.  At  least,  the  tendency  is  roughly  traced,  which  was 
not  clear  from  Figure  3.26  alone. 

Concluding  the  consideration  of  these  two  questions,  it  can  be  stated  that  the 
prediction  capability  of  the  method  can  be  advanced,  by  improving  the  technique  for 
estimating  the  conditional  probability. 

It  is  also  clear  that  when  the  threshold  is  too  low,  the  fitted  conditional 
distribution  is  dominated  by  the  data  not  very  far  from  the  threshold  itself,  which  is  not 
necessarily  allowing  the  correct  prediction  of  the  tail  of  the  distribution. 
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Figure  3.29  Statistical  estimate  of  uperossing  rate  through  the  threshold  (upper  plots)  and 
extrapolated  estimate  of  conditional  probability  that  the  process  will  exceed  the  le\el  of  14  m  if  the 
threshold  has  been  crossed  (lower  plots)  for  two  alternative  data  sets 
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3.3.  3.  Alterative  Solution  for  Rare  Problem 

Difficulties  predicting  the  behavior  of  the  tail  of  fitted  distributions  are  not  new. 
These  difficulties  were  one  of  the  motivations  for  the  development  of  the  extreme  value 
theory;  therefore  it  is  quite  logical  to  try  to  use  extreme  distributions  for  the  rare  problem. 

Consider  the  probability  that  no  uperossing  will  occur  through  the  level  a2  during 
time  T;  assuming  applicability  of  Poisson  flow: 

P2(n2  =0,7')  =  exp(-X2r)  (3.26) 

Here,  n2  is  number  of  uperossing  through  the  level  a2,  ^;is  rate  of  uperossing  through  the 
level  (.72- 

Now  consider  uperossing  through  the  level  a i,  such  as: 

a2  >  o,  (3.27) 

The  probability  that  no  uperossing  will  occur  through  the  level  aj  during  time  T\ 
assuming  applicability  of  Poisson  flow  can  be  expressed  as: 

=0,7’)  =  exp(->.l7’)  (3.28) 

Here  n\  is  number  of  uperossings  through  the  level  a \,  >*iis  rate  of  uperossing  through  the 
level  a\. 

Probability  Pi  can  expressed  through  the  probability  P\.  That  is,  if  there  are  no 
uperossings  through  the  level  a i,  there  are  no  uperossings  through  a2,  or  there  are  some 
uperossings  through  a\,  but  the  process  never  reached  the  level  ai.  As  the  events  of  no 
uperossing  through  the  level  a\  and  at  least  one  uperossing  through  the  level  c7|  are 
incompatible,  the  probability  Pi  is  expressed  as: 

P2  =  Pt  +  P(n2  =  0  n  u,  >  0)  (3.29) 

Here  P(ni=0  f)  /7i>0)  is  a  probability  that  no  uperossing  occurs  through  the  level  ai  and 
there  is  at  least  one  uperossing  through  the  level  a\.  This  probability  can  be  expressed  as. 

P(n2  =0o7?,  >0)=  P(n2  =  0 1 77,  >  0)/>(n,  >0)  (3.30) 

Here  P(n2=0  |  «i>0)  is  a  probability  that  no  uperossing  occurs  through  the  level  a2  if  there 
is  at  least  one  uperossing  through  the  level  a i,  while  P(t7|>0)  is  a  probability  of  at  least 
one  crossing  through  the  level  a\.  This  is  the  probability  of  a  complimentary  event  to 
equation  (3.28): 
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P(nt  >  0)  =  1  -  P(n]  =0,T)  =  \-exp{-X,T) 


(3  31) 


Consider  an  extreme  value  distribution  obtained  over  time  T  (the  distribution  of 
the  maximum  value  observed  during  time  T).  By  definition,  the  cumulative  distribution 
function  is: 


Fex  (x,  T)  =  7>(max(x,  T)  <  x)  (3  32) 

Here  max(x,7)  stands  for  a  maximum  value  of  process  \(t)  observed  during  time  T. 

Using  the  technique  proposed  by  G.  Hazen  and  described  in  detail  in  Section  2,  a 
probability  of  a  complimentary  event  (at  least  one  uperossing  during  time  71  can  be 
expressed  using  an  assumed  Poisson  flow: 

l-exp(-X2r)=  \-FEV(a2,T)  (3  33) 

So,  it  clear  from  equations  (3.32)  and  (3.33  )  that  the  probability  of  no  uperossing 
through  the  level  c/2  is  equal  to  the  CDF  of  the  extreme  value  observed  during  time  7  and 
calculated  for  the  level  co: 

exp(-A ,2T)=  Fei  (c/,,7’)  (3  34) 


Consider  a  conditional  distribution  of  an  extreme  value  reaching  the  level  <?: 
under  condition  that  it  has  exceeded  the  level  a\,  minding  the  condition  (3.27): 


f,;i  (x  =  CI,,7'|x>C/|) 


Ff,  (a„n 


(3  35) 


Here  divisor  Fn(ci\,T)  plays  the  role  of  a  normalization  coefficient. 

Conditional  CDF  can  be  expressed  analogously  to  the  equation  (3.35): 


7* (a  c/,,7’1  a  ^  c/j ) 


Ft- 1 

FH(a,J) 


(3.36) 


By  the  definition  of  CDF: 

Fn  (x  =  c/,,7'  x  >  c7,  )=  F(max(x,r)  <  a2  \  max(.v,7')  >  c/, )  (3.37) 


If  an  extreme  value  observed  during  time  T  has  exceeded  the  level  c/i.  then  the 
number  of  uperossings  through  this  level  observed  during  time  T  differs  from  zero: 

[max (,v, T) >//,}<=>  {/?,  >0)  (3.38) 

If  the  extreme  value  observed  during  time  T  has  not  exceeded  the  level  c/:,  then 
the  number  of  uperossings  through  this  level,  observed  during  time  T,  is  zero. 
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{max  (jc,  T )  <  a2 }  <=>  {//,  =  0 } 

Equations  (3.36)  and  (3.37)  lead  to  the  following: 
P(n2  =  01/;,  >  0)  =  P(max (.v, T)<a2  |  max (x, T) >  a, ) 


(3.39) 


(3.40) 


Formula  (3.35)  relates  the  conditional  probability  of  no  upcrossings  of  the  level  02 
given  there  are  upcrossings  through  the  level  a \  conditional  extreme  value  CDF: 

P(n2  =0|«,  > 0)  =  F Ei  (x  =  a2,T\x>at)  (3.41) 


Substitution  of  (3.41)  and  (3.36)  into  (3.29),  taking  into  account  (3.30)  and  (3.31), 
leads  to: 


R 


-  P\+(l~ P\) 


(3.42) 


Taking  into  account  formulae  (3.26)  and  (3.28)  allows  expressing  the  rate  of 
upcrossings  of  the  level  aj  through  the  rate  of  upcrossing  of  the  level  a\  and  extreme 
value  distributions: 


A.,  = - In 

T 


1  exp(- >.,r)  +  (l -exp(-Xl7,))^fl 


FEA^T) 


(3.43) 


Formula  (3.43)  represents  the  complete  solution  of  the  problem  with  a  different 
formula  than  (3.8)  .  A  combination  of  a  rare  and  non-rare  problem.  It  is  trivial,  however 
to  express  the  solution  of  rare  problem  from  (3.43)  explicitly: 


P  = 


XJ 


-In 


exp(-  A.,  r) + (l  -  exp(-  A.,  r)) 


FE]{a,J) 


(3.44) 


Formula  (3.44)  allows  the  use  of  an  extreme  value  distribution  for  the  rare 
problem. 

3.3.4.  Extreme  Value  Distribution  for  Peak  over  Threshold 

To  use  the  alternative  solution  for  the  rare  problem  described  in  the  previous 
section,  the  conditional  extreme  value  distribution  of  peaks  over  the  threshold  is  needed. 
The  procedure  of  fitting  the  extreme  value  distribution  was  described  in  detail  in 
section  2  and  briefly  revisited  below. 

First  the  window  size  has  to  be  set  up.  It  should  be  large  enough  so  that  the 
maximum  value  observed  in  each  window  can  be  considered  as  an  independent 
realization.  For  all  further  calculations,  window  size  was  taken  equal  to  the  record  length 
unless  otherwise  stated. 
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The  sample  data  has  to  be  collected  to  fit  a  Weibull  distribution.  The  data  points 
are  the  maximum  values  observed  in  each  window.  To  avoid  dealing  with  uncertainty  of 
the  shift  parameter,  only  points  that  have  exceeded  the  threshold  i/i  are  collected.  Next, 
the  fitted  distribution  is  actually  the  conditional  extreme  value  distribution,  needed  for 
formulae  (3.43)  and  (3.44).  The  conditional  distribution  in  CDF  form  is  expressed  as: 


I'  i  i  (  -' 


max  I  '  max 


>  O.)  = 


Fh  (-WH 

fev  («,.n 


1  -  exp 
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•'max  <  Q\ 
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(3.45) 


Parameters  of  the  distribution  A  and  a  are  determined  using  the  method  of 
maximum  likelihood,  with  the  initial  values  coming  from  the  method  of  moments. 
Evaluation  of  the  confidence  interval  is  not  different  than  in  the  previous  ease.  An 
example  for  the  threshold  value  of  9  m  is  shown  in  Figure  3.30. 


Figure 3.30  Fitting  the  Weibull  distribution  with  confidence  interval  for  peak  over  the  threshold 
data,  the  threshold  9  m,  and  window  time  1800  s  (time  duration  of  entire  record) 


3.3.5.  Extrapolation  with  Extreme  \'alue  Distribution  of  POT 

Once  the  distribution  has  been  fitted,  formula  (3.43)  can  be  used  for  extrapolation 
Figure  3.31  shows  sample  results  for  the  9  m  threshold,  using  the  distribution  shown  in 
Figure  3.30. 


Figure  3.31  Extrapolated  estimate  of  upcrossing  rate  with  confidence  inters al  as  a  function  crossing 
level  vs.  theoretical  upcrossing  rate  based  on  extreme  value  distribution  of  peaks  over  the  threshold. 

The  threshold  9  m,  1 1 1  peaks 

Unfortunately,  good  extrapolation  shown  in  Figure  3.31  does  not  mean  that  it 
remains  the  same  for  any  threshold.  Figure  3.32  shows  “breakpoint”  value  for  the 
extrapolation  based  on  extreme  value  distribution.  In  principle,  a  general  picture  is 
similar  to  Figure  3.21 ;  however  the  lower  level  of  the  breakpoint  seems  to  be  a  bit  higher. 


Figure  3.32  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  for  the 
extrapolation  based  on  extreme  value  distribution  vs  threshold 

The  difference  between  two  techniques  of  extrapolations  becomes  clearer  when 
comparing  the  rare  solution  calculated  for  a  particular  target  level  -14  m  (the  level  where 
the  prediction  is  needed);  see  Figure  3.33  and  compare  with  Figure  3.26.  In  general,  a 
solution  based  on  extreme  value  distribution  is  closer  to  the  theoretical  one.  Most 
importantly,  the  prediction  is  correct  for  relatively  low  levels  where  more  data  exist  and 
the  confidence  interval  is  narrower. 
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Figure  3.33  The  extrapolated  estimate  of  conditional  probabilitx  that  the  process  w  ill  exceed  the  lex  el 
of  14  m  if  the  threshold  has  been  crossed  (based  on  extreme  value  distribution) 

Similar  conclusions  can  be  made  comparing  the  complete  results  of  extrapolations 
between  the  two  methods,  see  Figure  3.34  and  Figure  3.22.  Acceptable  performance  can 
be  observed  for  almost  the  entire  range  of  the  thresholds  for  the  extreme-value  based 
technique  in  Figure  3.34  vs.  direct  POT  fitting  shown  in  Figure  3.22.  This  difference  can 
be  explained  by  the  extreme  value  distribution  model’s  much  better  behavior  of  the  tail  of 
the  distribution  as  it  is  its  main  “specialty”. 


Figure  3.34  Comparison  of  performance  of  extrapolation  method  for  14  m  using  different  >alues  for 
the  threshold  (based  on  extreme  value  distribution) 


Figure  3.34  shows  some  oscillation  of  the  extrapolated  estimate  around  the 
theoretical  solution.  As  the  threshold  increases,  the  oscillations  become  larger  and  the 
confidence  interval  no  longer  contains  the  theoretical  solution.  The  deterioration  of  the 
extrapolated  estimated  made  with  increasing  threshold  can  be  explained  by  a  decrease  in 
the  amount  of  available  data  to  fit  the  extreme  value  distribution,  as  fewer  and  fewer 
values  exceed  the  threshold. 

Averaging  of  the  estimates  over  a  portion  of  the  threshold  range  seems  to  be  the 
natural  way  to  improve  stability  of  these  calculations.  As  it  is  known  from  experience. 


10CM50  data  points  is  considered  a  minimum  amount  of  data  to  fit  the  distribution.  This 
number  can  be  used  as  a  criterion  for  the  high-end  threshold. 
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(3.46) 


Here  Xia  is  the  extrapolated  estimate  averaged  over  Na \  threshold  values  ai„  and  is 

a  value  of  the  extrapolated  estimated  based  on  the  threshold  a\,. 

Lower  and  upper  boundaries  of  the  confidence  interv  al  can  be  also  be  averaged  in 
the  first  expansion: 
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(3.47) 


Here  X'"'-,  and  A."f,  are  lower  and  upper  boundaries  of  the  confidence  interval  for  the 
averaged  extrapolated  estimate.  (au)  and  (au )  are  the  boundaries  for  the 
extrapolated  estimated  based  on  the  threshold  au. 

Figure  3.35  shows  the  result  for  the  target  level  c/2=  1 4  m.  The  theoretical  solution 
hits  almost  exactly  in  the  middle  of  confidence  interval. 
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Figure  3.35  Level  14  m:  theoretical  solution  and  extrapolated  estimate  averaged  for  the  thresholds 
7. 5-8.5  m.  The  distribution  for  the  threshold  8.5  m  was  fitted  with  149  points 
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Figure  3.36  shows  the  performance  for  all  levels  from  <32=9  to  22  m.  The  break  point  is 
20  m,  which  gives  an  uperossing  rate  of  5.5*  10*13;  this  is  more  than  enough  for  practical 
calculations  as  the  mean  time  to  an  event  is  about  57,000  years. 

Successful  application  of  the  averaging  over  several  thresholds  for  the  current 
numerical  example  does  not  yet  prove  that  it  will  work  as  well  for  all  other  cases.  While 
it  seems  to  be  impossible  to  prove,  it  still  makes  sense  to  try  it  at  least  on  two  alternative 
data  sets  used  earlier  in  this  section.  Figure  3.37  shows  the  dependence  of  the 
breakpoints  of  these  datasets  as  a  function  of  the  threshold,  similar  to  Figure  3.32.  The 
lowest  point  is  about  13  m  in  both  cases. 


Figure  3.36  Theoretical  solution  and  extrapolated  estimate  averaged  for  the  thresholds  7. 5-8. 5  ill.  I  he 
distribution  for  the  threshold  8.5  m  was  fitted  with  149  points 


Figure  3.37  Breakpoint  level  (the  lex  el  below  w  hich  the  extrapolation  is  still  good)  for  the 
extrapolation  based  on  extreme  value  distribution  vs  threshold  for  two  alternative  data  sets 

Figure  3.38  shows  behaviors  of  the  rare  solution  and  the  complete  extrapolated 
estimate  for  t/2-14  m  using  two  alternative  datasets.  These  behaviors  are  principally 
similar  to  the  original  set  seen  in  Figure  3.33  and  figure  3.34.  Quite  a  number  ol 
threshold  values  enables  the  estimate  “to  eateh"  theoretical  solution  in  its  confidence 
interval. 

Figure  3.39  shows  results  of  the  averaging  technique  for  the  two  alternative 
datasets.  The  performance  is  not  as  dramatie  as  the  original  one,  but  is  still  usable. 
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a)  rare  solution,  alternative  set  I 


b)  rare  solution,  alternative  set  2 


Figure  3.38  extrapolated  estimate  of  conditional  probability  that  the  process  will  exceed  the  level  of 
14  m  if  the  threshold  has  been  crossed  -  rare  solutions  (upper  plots:  a,  b)  and  complete  extrapolated 
estimate  (lower  plots:  e,  d)  for  two  alternative  data  sets  for  a2=  14  in 


•  10 


b)  Alternative  set  2 
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Figure  3.39  Level  14  m:  theoretical  solution  and  extrapolated  estimate  averaged  for  the  set  1 
thresholds  7. 5-8. 5  m.  The  distribution  for  the  threshold  8.5  m  was  fitted  with  145  points.  For  the  set  2 
range  is  7. 5-8.6  m  with  146  points  for  the  threshold  8.6  m. 

Finally,  Figure  3.40  compares  the  theoretical  solution  with  the  extrapolated 
estimate  averaged  through  a  range  of  the  thresholds.  Breakpoints  for  the  alternative 
dataset  lay  on  the  levels  of  15.25  m  and  21.1  m  respectively.  Even  the  lowest  breakpoint 
corresponds  to  the  value  of  the  rate  2.531-10'  s'  ;  while  the  mean  time  before  the  event  is 
1 .25  years.  Of  course,  it  is  less  than  the  original  dataset  has  shown,  but  still  good  enough 
for  200  records  30  minutes  each,  as  the  method  allowed  extrapolating  data  from  100  hrs 
to  10,000  hrs. 
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a)  Alternative  set  1 


a)  Alternative  set  2 


Figure  3.40  I  heoretical  solution  and  extrapolated  estimate  averaged  for  the  alternative  data  set. 
Set  1:  Range  7. 5-8.5  m;  145points  for  the  threshold  8.5  in 
Set  2:  7.5-8.6  m;  146  points  for  the  threshold  8.6  m 


3.4. Summary 

The  main  difficulty  associated  with  characterizing  the  likelihood  of  a  large  roll 
angle  occurring  is  related  to  the  problem  of  rarity.  The  nonlinearity  of  the  dynamical 
system  describing  large  roll  motions  of  a  ship  creates  additional  difficulties.  The  natural 
frequency  of  roll  of  a  ship  changes  as  a  function  of  roll  amhtude.  This  frequency  shift 
makes  the  response  significantly  different  for  small  and  large-amplitude  motions.  These 
difficulties  are  overcome  by  separating  the  problem  into  "non-rare"  and  “rare"  sub¬ 
problems.  The  “non-rare"  sub-problem  is  based  on  relatively  small-amplitude  motions. 
Its  solution  provides  the  rate  of  uperossing  a  threshold,  a  boundary  that  separates  small, 
almost  linear  motions  from  moderate  and  large-amplitude  motions,  where  nonlinearity  is 
significant.  This  threshold  is  the  boundary  between  the  “non-rare"  and  “rare"  sub- 
problems.  The  non-rare  problem  was  considered  in  the  section  1  of  this  report,  further 
work  is  focused  on  “rare"  problem. 

The  solution  of  the  “rare"  problem  is  based  on  the  statistical  properties  of  the  data 
points  above  the  threshold.  The  idea  is  to  use  them  to  predict  large  angles,  as  the 
influence  of  nonlinearity  is  already  significant  above  the  threshold.  Among  the  data 
points  above  the  threshold,  the  peaks  are  of  special  interest  and  they  can  also  he 
considered  for  Poisson  flow.  It  can  also  be  shown  that  reaching  a  peak  above  the 
threshold  is  an  event  equivalent  to  uperossing  of  this  threshold. 


A  distribution  that  is  fit  to  the  peaks  over  the  threshold  is  a  conditional 
distribution.  This  conditional  distribution  describes  the  probability  that,  once  the 
threshold  is  crossed,  a  higher  level  is  crossed.  This  constitutes  the  solution  of  the  “rare'’ 
problem.  The  distribution  may  also  be  considered  as  an  extreme  value  distribution,  in 
which  case  the  maximum  values  from  fixed  length  windows  are  used  in  place  of  the 
peaks.  The  extreme  value  distribution  describes  the  behavior  of  the  tail  of  a  distribution 
and  may  therefore  provide  more  accurate  extrapolation.  Averaging  results  obtained  with 
several  thresholds  also  seem  to  improve  the  accuracy  of  the  method. 
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4.  Envelope  Theory 

This  section  examines  the  properties  of  the  envelope  and  then  considers  its 
application,  along  with  uperossing  theory,  as  a  method  of  evaluation  of  the  probability  of 
rare  events. 

4.1. Definition  and  Background 

As  it  was  demonstrated  in  the  Section  I,  a  violation  of  Poisson  flow  is  caused  b\ 
too  many  crossings  of  neighboring  periods.  This  is  especially  pronounced  when  the 
spectrum  is  narrow,  which  leads  to  significant  grouping  or  clustering.  Such  a  situation 
may  be  typical  for  following  and  stem  quartering  waves  when  the  encounter  spectrum 
can  become  very  narrow.  Envelope  theory  may  be  useful  in  these  cases  Belenky  and 
Breucr  (2007)  show  an  example  of  sueeessful  application  of  the  envelope  for  the  case  of 
parametric  roll,  a  process  known  for  its  narrow  spectrum. 

Most  Naval  Architecture  applications  dealing  with  irregular  waves  use  a  Fourier 
presentation  of  a  stochastic  process: 

■v(/)  -  /-ll(cos(oy +  (p,)  (4.1) 

/=! 


Here,  to,  is  set  of  frequencies  used  for  discretization  of  the  given  spectral  density,  /•»,  is 
amplitude  of  the  /-th  component  and  (p,  is  a  phase  shift  for  the  ilh  component.  If  a  process 
is  normal,  like  in  the  ease  of  elevations  of  irregular  waves,  amplitudes  of  components  arc 
taken  from  a  spectrum,  while  phase  shifts  are  considered  as  a  set  of  independent  random 
numbers  with  uniform  distribution  from  0  to  360  degrees. 

The  concept  of  the  envelope  came  from  envelope  presentation,  which  is  an 
alternative  way  to  describe  the  time  history  of  a  stochastic  proeess: 

v(/)  =  o(/)cos(cf>(/))  (4.2) 

Here  the  process  .v(/)  is  presented  through  two  other  stochastic  processes: 
amplitude  or  envelope  a(t)  and  phase  T>(/).  Originally  the  envelope  presentation  was 
developed  for  a  stationary  normal  processes  (Rice,  1944,  1945);  the  principles  it  is  based 
upon  may,  however,  be  extendible  to  non-Gaussian  processes  as  well.  The  role  of  phase 
<!>(/)  is  keeping  the  “memory”  of  the  proeess;  it  makes  sure  that  the  presentation  (4.2) 
does  not  alter  the  autocorrelation  function  of  the  proeess  ,v(/).  The  role  of  the  envelope 
(a(t))  is  to  ensure  the  variance  is  maintained,  sec  (Belenky,  el  al  2006)  where  an  example 
application  of  the  envelope  presentation  is  shown. 

Formally,  the  envelope  is  defined  through  a  complementary  stochastic  process.  It 
is  v(0  is  defined  as: 


(4.3) 


v(0  =  si  n(  ay -f(p,) 
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The  stochastic  processes  jt(/)  and  y(t)  are  not  correlated  if  the  time  is  fixed,  as  the 
correlation  moment  at  the  fixed  time  is  zero,  since  the  phases  were  shifted  90  degrees  (if 
the  process  x(t)  and  v(t)  are  normal  they  are  also  independent  at  the  fixed  time). 
However,  the  values  of  the  processes  may  be  correlated  if  they  are  taken  at  different 
instances  of  time.  The  operation  of  obtaining  the  complimentary  process  is  known  as 
Hilbert  Transform. 

The  envelope  a(t)  is  defined  as: 

a(t)  =  t]x2+}’2  (4.4) 

The  envelope  is  a  stochastic  process  with  its  own  autocorrelation  function  and 
distribution  that  differs  from  the  distribution  ofar(/)  and  y(t). 

Further  considerations  rely  on  the  same  numerical  example  of  wave  elevations.  It 
is  described  in  detail  in  section  1.  Figure  4.1  shows  the  envelope  along  with  the  process, 
it  was  derived  from.  The  negative  reflection  of  the  envelope  has  been  added  for  better 
visualization.  Figure  4.1  makes  clear  the  origin  of  the  term  “envelope”.  The  actual 
envelope  and  its  negative  reflection  cover  the  entire  process  and  serve  as  its  outer 
boundary. 


Figure  4.1  Stochastic  process  of  wave  elevations  and  its  envelope.  Negative  reflection  of  the  envelope 

is  added  for  visualization  only 


A  closer  look  reveals  that  the  envelope  is  not  just  a  smooth  curve  that  connects 
the  peaks  of  the  process.  It  accounts  for  negative  peaks  (case  A  in  Figure  4.1).  Sometimes 
the  envelope  has  a  peak  when  the  process  itself  has  neither  negative  nor  positive  peak 
(case  B  in  Figure  4.1).  The  origin  of  this  “unclaimed”  peak  of  the  envelope  can  be 
clarified  by  plotting  the  complimentary  process  y(t)  along  with  the  original  process  .v(/) 
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and  its  envelope,  see  Figure  4.2.  It  becomes  clear  that  the  peak  of  the  envelope  can  be 
also  caused  by  the  peak  of  the  complimentary  process. 

Moreover,  it  is  possible  to  show  that  all  the  points  of  the  envelope  are,  in  fact, 
peaks  of  the  process  ,v(/)  shifted  by  an  angle.  Consider  a  process  z(/): 

\ 

z(i  |  y)  -  £ rm  eos(oy  +  <p,  -y)  (4.5 ) 

/»l 

Here  y  is  the  shift  angle.  Figure  4.3  shows  the  shifted  process  r(/)  along  with  original 
process  x(t)  and  complimentary  process  y(/).  Each  peak  of  the  shifted  process  z{t)  belongs 
to  the  envelope.  This  provides  a  graphical  interpretation  of  the  envelope  and  explains  the 
origins  of  its  peaks. 


Figure  4.2  Origin  of  peaks  of  envelope:  original  stochastic  process  .v(t)  and  its  complimentary  process 

yit) 
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Figure  4.3  Envelope  and  peaks  of  the  shifted  process 
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4. 2. Outline  of  the  Theory  of  the  Envelope 

This  subsection  examines  an  outline  of  the  basics  of  the  theory  of  the  envelope  for 
a  normal  process.  It  is  assumed  the  process  .v(t)  is  normal  and  has  zero  mean  value,  but  it 
does  not  limit  generality,  as  it  is  always  possible  to  introduce  a  shift.  Consideration 
generally  follows  the  classical  text  by  Sveshnikov  (1968). 

4.2. 1.  Distribution  of  the  Envelope  and  the  Phase 

The  complimentary  process  y(t),  as  the  result  of  Hilbert  transform,  can  also  be 
expressed  as: 

V 

y(t)  =  Xri»v  sin(©,/  +  cp, )  =  a(t)s\n(  <!>(/))  (4.6) 

i=i 

Assumption  of  normality  is  extended  to  the  complimentary  process  y(/).  It  is 
naturally  followed  if  Fourier  presentation  (4.1)  is  used  for  the  original  process  x(t)  and 
uniform  distribution  of  phases  of  components  is  assumed. 

Consider  a  probability  that  the  envelope  takes  a  particular  value.  Taking  into 
account  (4.4),  it  can  be  expressed  in  a  form  of  the  following  inequality: 

a  <  tJ.x2  +  y‘  <  a  +  da  (4.7) 

The  probability  of  satisfying  the  inequality  (4.7)  is  directly  related  with  the  PDF 
of  the  envelope  /(a): 

P\a  <  *Jx2  +  y2  <  a  +  chi)=  f(a)da  (4.8) 

The  probability  (4.8)  can  be  evaluated  if  the  joint  distribution  of.v  and  y  is  known: 

f(a)cia  =  p[a  <  ^]x2  +  y2  <a  +  da)=  Jj f(x,y)dxdy  ^  ^ 

a<fx~  *  t  <a+th 


Here  /(.v,  v)  is  joint  distribution  of  the  original  process  and  its  complimentary  process.  As 
a  normal  distribution  was  assumed  for  both  of  them  the  joint  distribution  is  expressed  as: 


f(x,v)  = 


exp 


1 

A-  2  rxyxy  y 

ly  y  v 

'■  JJ 

(4.10) 


Here  Vx  and  Vy  are  variances  of  the  original  and  complimentary  process,  respectively;  rxy 
is  the  correlation  coefficient  of  the  original  and  complimentary  processes: 

The  variances  of  the  original  and  complimentary  processes  arc  identical.  Taking 
into  account  presentation  (4.1)  and  (4.3): 
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(4  11) 


v  =y 


As  noted  in  the  previous  subseetion,  the  original  and  the  complimentary  processes 
are  not  correlated,  as  the  shift  between  them  is  90  degrees.  It  ean  be  elearly  seen  for  the 
known  formula  for  the  correlation  coefficient  between  two  processes  expressed  with 
Fourier  series: 


1  v  1  lax 

=  ■  2_.r,,  cos(flip,)  =  .  2/.,  cos(<p, -(<P, -0.5n))  = 


•  /  rM,  eos(0.57i)  =  0 


i=i 


(4  12) 


Here  Aip,  is  the  difference  between  phase  of  components.  For  the  details  of  derivation  of 
this  formula,  see  (Belenky  &  Sevastianov  2007)  or  (Belenky,  et  al ,  2007). 

Taking  into  aeeount  (4.1 1)  and  (4.12),  the  distribution  (4.10)  ean  be  simplified 


/  ( v,y) 


2nl 


rexp 


^__1 

f 

„2  ,  :  ^ 
r  +  y 

1 

v 

V 

\ 

’  >  JJ 

(4  13) 


Substitution  of  the  distribution  (4.13)  into  (4.9)  and  transition  to  polar 
coordinates  yields: 


2ti  f 


f  V*  *  r"  <a  *  Jit 

)dxch 

;  = 

(  2  ->  \\ 

f  f 

1 

x  +  y 

JJ  exp 

j 

i/ 

•  r*  <*ti  ♦  Ja 

\ 

i  1  <  jj 

cixdy  = 


a  =  ifx2  +  v 2  V  =  acos(O) 
<t>  =  arctunf  —  |  y  =  c/sin(0) 


(4.14) 


it  *  lilt  2  TT 


= -  f  [a  exp 

2k  V  J  J 


(  i  (  2  X 

a 


"  V  .  J 


*AI>  da 


At  the  same  time,  consider  /(a)  as  a  marginal  distribution  of  the  joint  distribution  /(t/.O) 


125 


a+da  2k 


f(a)da  =  |  J/(<3,0)  dd>  da 


(4.15) 


a  0 


This  joint  distribution /(a, O),  then  is  expressed  as: 


f(a,  <D)  = 


a 


2n  V. 


■exp 


.  /  2  "A 
a 


KV*J) 


(4.16) 


The  right-hand  side  does  not  eontain  the  variable  O.  It  means  that  the  variables  a  and  <t> 
are  independent.  The  PDF  of  a  ean  be  found  by  the  integration  of  (4.16)  by  <T>  from  0  to 
2k. 


f(a)  =  c/O  =  -^-exp 


, 

(  2  \\ 

1 

a 

I’1 

(4.17) 


This  distribution  is  known  as  a  Rayleigh  distribution. 

The  distribution  of  the  phase  ean  be  easily  found  from  the  formula  ( 1 6)  using 
the  established  faet  of  independence  of  envelope  and  phase: 


/(«*)= 


/(m>) 

/(«) 


0  <  <I>  <  2k 


(4.18) 


The  phase  in  the  envelope  presentation  (4.2)  follows  unformed  distribution  from  0  to  2k. 
This  concludes  consideration  of  PDFs  of  the  envelope  and  the  phase. 

4.2.2.  Autocorrelation  Function  of  the  Envelope 

To  find  autocorrelation  function  of  the  envelope,  the  joint  distribution  of  two 
values  of  the  envelope  a{t)  and  a(t+ x)  need  to  be  obtained  first.  This  can  be  done  through 
four-dimensional  distribution  of  values  .y  andy  at  the  time  instances  t  and  t+x.  Consider  a 
system  of  four  random  variables: 

U  =  (jf(/) ,  x(t  +  x) ,  y(t)  ,y(t  +  t))  (4.19) 

These  random  variables  are  values  of  the  original  and  complimentary  stochastic 
process  and  in  two  time  instances,  t  and  /+t.  As  processes  ,y  and  y  are  normal,  all  four 
variables  have  normal  distributions.  The  processes  .y  and  y  are  independent;  that  means 
that  the  variables  x(t)  and>’(f)  are  independent  as  well  as  the  variables  jy(/+t)  and  v(/+t). 
It  does  mean,  however,  that  the  variables  .y(/)  and  v(/+t)  are  independent.  Viee  versa, 
they  are  dependent  and  their  correlation  coefficient  is  expressed  using  formula  (4.12)  as: 
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(4  20) 


m{x(t),y(x  +  x))  = — X'i.2,  cos(w,x  -  0.5rr)=  — sin(co,x) 


2V  t, 


A  similar  conclusion  can  be  reached  for  another  “cross-pair"  of  the  random  variables 
.v(/+x)  and  y(t): 


in 


[xU  +  x),  >•(*))  =  —  cos(co,t  +  0.5k)  =  £4  sin(m,x)  (4  2 1 

*■'  ■  i- 1  4K,  i- 1 


It  is  convenient  to  express  these  figures  as: 

I  V,‘ 

r(x)  =  -m(x(t  +  x ),y(x))=m(x(l),y(x  +  i))  =  —  JV,,2,  sin(co,x)  (4  22) 


V  1 


Dependence  between  random  variables  ,v(/)  and  v(/+x)  as  well  as  between y(t)  and 
r(/+x)  can  beexpressed  through  an  autocorrelation  function  of  the  processes  \  and  r, 
which  in  the  considered  ease  is  identical  to  the  application  of  formula  (4.12): 


m\  v( 


2VX 

I^cos((0,t) 

(4  23) 

1 

2l\ 

voi 

cos(co,x) 

1=1 

(4  24) 

It  is  convenient  to  express  these  figures  as: 


k(x)  =  m{x(x),x(t  +  x))=  m{y(l),y(. v  +  x))  =— Vz-f,  cos(co,x) 

i-i 


(4  25) 


In  fact  A(x)  is  the  normalized  autocorrelation  function  that  is  the  same  for  the  processes 
x(l)  and y(t). 

The  relationship  between  these  variables  is  summarized  with  the  following 
covariance  matrix: 


C(t)  =  K 


1 

A  (x) 

0 

r(T) 

A(r) 

1 

-r(x) 

0 

0 

-r(r) 

1 

A'(x) 

r(x) 

0 

A(x) 

1 

(4  26) 


All  these  variables  have  a  nonnal  distribution;  therefore  dependence  between  them 
is  completely  characterized  by  the  correlation  expressed  by  the  covariance  matrix  (4.26). 
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Next,  their  joint  distribution  is  completely  defined  by  the  following  4-variate  normal 
distribution: 


yw)= 


(2rr)2A/det(C) 


exp 


1 


~^uTc  lu  = 


(2tc)2  -N/det(CT ) 


exp 


4  4 


2  M  /  I 


(4.27) 


Here  the  superscript  T stands  for  the  transpose  operation.  It  converts  a  vector-column  into 


vector-row.  C  1  is  an  inverse  covariance  matrix.  It  is  expressed  as: 


C  (t)  = 


'  l 

-Hr) 

0 

-r(x) 

1 

-Hr) 

1 

>H) 

0 

K-p(  t)2 

0 

r(t) 

1 

~Hr) 

<-/'(t) 

0 

~Hr) 

1 

(4.28) 


Here: 

/>(t)2=1-Mt)2-/-(t)2 

The  determinant  of  covariance  matrix  is: 

det(C(t))  =  V*(\  -k(xf  -r( x)2J  =  V* p( t)J 


(4.29) 


(4.30) 


Substitution  of  the  formulae  (4.28)  and  (4.30)  into  (4.27)  leads  to  the  following 
expression  for  the  joint  distribution  of  considered  random  variable: 


f(U)  = 


(2 nVx ) 


—  exp 
P~ 


2  V,p 


t  (rf  +  .v,2  +  x;  +  vj  -  2k(xtx:  +  >’,.v2)  -2r(.r,  y,  +  y,.v2 ) 


(4.31) 


To  avoid  a  bulky  formula,  the  following  nomenclature  was  used  in  formula  (4.31): 

.v,  =x(t)  ;  .v,  =  x(/  + 1) 

v,  =  v(/)  ;  y2  =y(t  +  x)  (4‘32^ 

Formula  (4.31)  describes  probability  density  in  the  four-dimensional  spaec  with 
coordinates  jci,  X2, 3’i,  V’2-  The  next  step  is  to  re-write  in  the  polar  coordinates  defined  as 
follows: 
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a  =  y[P  +  y2  x  =  a  cos(O) 


O  =  arctan 


y  =  asin(O) 


(4.33) 


\x) 

The  new  coordinates  are: 

a,  =  a(t)  :  a2  =  a(t  +  r) 

O,  =  d>(/)  ;  O,  =  0(t  +  t) 


(4  34) 


To  complete  the  transition,  two  pairs  of  reetangular  coordinates  (.Vi,  y\)  and  (.v;,  is)  are 
substituted  with  (c/|,  <T> , )  and  (az,  <1>2).  Then  the  expression  needs  to  be  multiplied  by  </|  cn 
as  the  element  of  the  area  in  the  polar  coordinates  a  cM>  da. 


/( o, , a2 , CD, , <l>2 )  =  —  °'°21  -  exp 


(2ttK )2  p 


1 


2  Vxp 


“("l2  +  «2  - 


-lka{a2  cos(<J>,  —  <l>,  )-2ra,rt,  sin(0,  O,))) 
The  expression  (4.35)  ean  be  further  simplified  by  the  substitution: 


y  =  aretan 


(4  35) 


(4  36) 


/(al,a,,Ol,(D:)  = 


T3 — TexP 


(2  nVjp 
-2a{a:^\  -  p2  eos(C),  -d>,  -y)) 


1  /  ,  , 

TjT  T  «i  +  al  - 
2  Vxp 


(4  37) 


The  next  step  is  to  obtain  the  joint  distribution  of  a i,  ciz.  It  can  be  done  by 
integration  of  the  distribution  (4.37)  twice  by  Oi  and  O:: 


/'(«.»": )  =  17A-  J  fexP|  -  tttt — r(«,2  +  a;  - 
V<P  Si  l  2 Vxp 

—  2 aia2y[h-p2  cos(<t>, -<I>,  — y)|:AI>,cW>: 
The  integration  ean  be  completed: 


...  ,  a,a2 

J(a],a1)  =  —^Tex  p 

K  P~ 


+  a2 

2  Vjr 


2  \ 


i 

,  *> 

a\a:V 

1  -p 

Kp 


(4.38) 


(4.39) 
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Here  Io  is  the  zero-order  modified  Bessel  function  of  the  first  kind  (Abramowitz  and 
Stegun  1972). 

Finally,  the  autocorrelation  function  ean  be  obtained  from  the  PDF  (4.39)  using  its 
definition: 


X  X 

Ra  (T)  =  J  {/(«!  •  ch  )(^I  -  ma  Xai  -  m„  )daxda2 

0  0 


(4.40) 


Here  ma  is  a  mean  value  of  the  envelope.  As  it  was  shown  that  the  envelope  follows  the 
Rayleigh  distribution,  the  mean  value  is  known: 


(  2  \\ 
'  a  ' 


mu  -  ^af{a)du  =  —  ja2  exp  —  da  =  ^/0.5j iVx 
0  f t  o  V 


(4.41) 


The  integration  results  in  the  following  formula  for  the  autocorrelation  function: 

W  =  ^f(2E(l  -  p:)-p2 K(l  -  p2 )-0.5ic)  (4.42) 

Here  E  and  K  are  elliptic  integrals  of  the  first  and  the  second  kind: 

E(.v)  =  j  7  ==^='  K(-v)=  |  -y  1  -  x  •  sin !  z  dz  (4.43) 

a  vl  -  J -sin  2  o 

An  example  of  the  calculation  of  the  autocorrelation  function  of  the  envelope,  as  well  as 
its  comparison  with  a  statistical  estimate,  is  given  in  the  subsection  below. 

4.2.3.  Distribution  of  the  Derivative  of  the  Envelope 

The  theory  of  the  envelope  also  offers  the  PDF  of  the  derivative  of  the  envelope. 
This  result  may  be  important  for  an  application  of  the  uperossing  theory  to  the  envelope. 

To  find  the  distribution  of  the  derivative  of  the  envelope,  the  joint  distribution  of 
the  envelope  and  its  derivative  need  to  be  found  first  and  then  integrated  from  zero  to 
infinity  by  the  value  of  the  envelope: 


f(a)  =  jf(a,d)da  (4.44) 

0 

The  joint  distribution  of  the  envelope  and  its  derivative  ean  be  derived  from  the  joint 
distribution  of  two  values  of  the  envelope  (4.39).  This  problem  can  be  classified  as 
multivariate  probability  transformation,  when  the  distribution  of  one  random  veetor  is 
derived  from  the  distribution  of  the  other  random  veetor.  It  also  implies  that  these  random 
vectors  are  related  to  the  deterministic  vector  valued  function. 
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(4.45) 


(a{ty 

nr 

V 

V 

y“(0j 

^(Z  +  T), 

UI 

Aj 

Generally,  a  derivative  is  defined  as  a  limit: 


a(t  +  x)-a(t)  .  ..  —a. 

a(t)  =  lim -  or  a,  -  Inn— - - 


(4.46) 


Formula  (4.46)  represents  a  component  of  a  veetor-valued  deterministic  function  of  a 
random  vector;  the  other  component  is  obvious: 


/  \ 

f«, 

a\ 

<a2  J 

lim  °2  ~ 

\  '-*0  x  J 

(4  47) 


Sinee  the  first  component  of  the  function  (4.47)  maps  a \  into  itself  and  does  not  depend 
on  x,  the  symbol  of  limit  can  be  applied  to  the  entire  function: 


(  „ 

\ 

(aA 

G\ 

=  lim 

a 

<a2  J 

r-»0 

l  x 

J 

(4.48) 


Assume  that  x  is  small.  Then  introduce  approximation  for  the  function  (4.48): 


^  * 

V 

^  — 

J 

~ 

a2  -  fl, 

x  J 


(4.49) 


The  formulation  of  the  problem  of  multivariate  probability  transformation  is 
completed  Its  solution  is  well-known  from  the  general  theory  of  probability  (see.  for 
example,  Goodman,  1985): 

/*(«,,«,)=!  /'('+',* (<:/,.«.,), ( c/, , cv, ) )  (4  50) 

Here  the  veetor  valued  function  4'*  is  an  inverse  to  the  vector  valued  function  E"  and  J 
stands  for  the  determinant  of  Jacobean  matrix. 


(4.51) 


a , 


V^:  y 


v"iy 


a, 


va,+a,Ty 


The  second  component  of  the  inverse  function  (4.51)  was  formally  derived  from  the 
second  component  of  (4.49): 


Ct-y  .  VI  #  *  /  *  \ 

— -  <=>  a ->  —  a}  +  a,  x  =  x  (a. ,  a. ) 


(4.52) 


However,  it  is  also  follows  from  the  assumption  that  x  is  small: 

a(t  +  x)  =  a(t)  +  xd(t)  or  a2-a]+xd]  (4.53) 

The  determinant  of  the  Jacobean  matrix  of  the  inverse  function  is  expressed  as: 


./(V*)=det 


da, 


da, 


5^2  (a.,  a,)  5%*  (a,,  a,) 


ca, 


da, 


=  det 


(\  0X 

v1 


=  x 


(4.54) 


Substitution  of  (4.54)  and  (4.51)  into  (4.50)  lead  to  the  following  expression  for  the 
approximate  joint  distribution: 


/*(a,,<7,)  =  x/(a,,o,  +  d,x) 


(4.55) 


The  exact  distribution  of  the  envelope  and  its  derivative  is  actually  a  limit  of 
(4.55)  when  x  tends  to  zero: 


/(<*,,«,)  =  lim/*  («,,«,)  =  limx/tc/ptf,  +a,x) 

T-»0  T — >0 

Substitution  of  (4.39)  into  (4.56)  yields  (index  1  may  be  dropped  now): 


(4.56) 


xa(a  +  xa) 

f(a,a)  =  Inn — — exp 
X^°  'x  P‘ 


2 a2  +  2 aax+a'  x 

2vy 


a(a  +  xd)^j  1  -  p2  ^ 

K? 


(4.57) 


To  carry  out  the  limit  transition  in  the  formula  (4.57),  p  needs  to  be  considered  in 
more  detail  as  it  is  a  function  of  x,  see  formula  (4.29),  repeated  here  for  convenience: 
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/;(t)’  =  I  -k(x)2  -r(x) 1 

The  function  A(x)  is  a  normalized  autocorrelation  function.  It  was  defined  by  the 
formula  (4.25),  which  is  based  on  discretization  of  spectrum  with  frequency  set  u)„ 
/  1  ,..,Nl0.  Formally,  it  is  the  cosine  Fourier  transform  of  the  spectral  density  .v(co): 


A(x)  =  --  f.v(co)cos((oxV/co 
V  • 


(4.58) 


The  function  r(i)  is  a  normalized  cross-correlation  function.  It  was  defined  by 
formula  (4.22)  based  on  the  same  discretization.  It  is  a  result  of  sine  Fourier  transform  of 
the  spectral  density: 


•(t)  =  —  J.v  ( to)  s  i  n  ( co  x  )t/co 

l  " 

v  0 


(4.59) 


Figure  4.4  shows  the  function  />(x)  along  with  normalized  autocorrelation  function 
A(x)  and  normalized  cross-correlation  function  /  (x)  calculated  for  the  numerical  example. 


Figure  4.4  Function  p,  normalized  auto-  and  cross-correlation  functions 

As  it  ean  be  seen  from  Figure  4.4  the  function  p(x)  tends  to  zero  with  the  decrease  of  time 
duration  x.  To  describe  behavior  of  this  function  near  zero,  it  is  convenient  to  expand  it 
into  the  Taylor  series  about  the  zero  point  (actually,  then  it  is  Maclaurin  series): 


/;(x):  =  /;(0):  +-j|/40)2x  +  ^/)(0)V  +... 


(4  60) 


Consider  each  term  of  (4.60): 
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/>(0)2  =  1  —  A  (0)2  —  r(0)2  =  1-1-0  =  0 


(4.61) 


/>(0)2=4(l-A'(T):-r(T):)  =  -2(A'(t)A:(x)  +  /(T)r(T)) 

a  i 


T=0 


T=0 


P(0)2  =  4(_  2A(t)A'(x)  -  2r(T)r(t)) 
ax 

=  -  l{k(i)k{x)  +  A(t)2  +  r(t)r(T)  +  r{  t)2  ) 


(4.62) 


(4.63) 


T=0 


Values  of  the  auto-  and  cross  correlation  at  t  =  0  arc  expressed  as: 
A (0) = 1  ;  r(0) =  0 

Derivatives  of  the  auto-  and  cross-correlation  functions  are: 


(4.64) 


A(t)  =  —  J.v(ro)(osin(coT)j(o  ;  A(t)  =  -—  j".v(co)co2  cos((ot)<:/co  (4.65) 


v  o 


i  o 


J  '  j  ^ 

r(t)  =  -—  j.v(co)cocos(coT)c/co  ;  /;(t)  =  -—  j.v((o)co2  sin(fOT)c/(o  (4.66) 


v  () 


The  values  of  these  derivatives  at  t  =  0  are  expressed  as: 
1  * 

A'(0)  =  —  Js(<o)(Dsin(cDT  =  ())c/(o  =  0 


r  0 


1  1 

/•(  0)  =  -  —  j"v((o)cocos(coT  =  0)c/co  =  -—  J.v(co)rot/ro  = 


CO 


(4.67) 


(4.68) 


x  0 


The  value  -  r(0)  is  the  mean  frequency  coi  as  determined  from  the  spectral  density. 


1  1 
A'(0)  =  — — -  |.v(to)or  cos(cot  =  0)r/(o  =  —  —  J.v(co)co2r/co  =  -co2 


X  0 


V 


(4.69) 


x  0 


The  quantity  -A(0)  has  a  meaning  of  the  second  moment  of  the  spectral  area, 
normalized  by  the  variance.  Its  usual  nomenclature  is  co, . 
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(4.70) 


j  1 

•'(())  =  — -  js((o)<o2  sin((OT  =  0)ch»  =  0 


x  0 


Formulae  (4.61 )  through  (4.70)  allow  us  to  express  the  expansion  (4.61 )  near  x  =  0  as. 


P(t)2  =  0-x(A(0)A(0)  +  r(0)r(0))- 
-x2(A(0)A(0)  +  A(0)2+/-(0)A(0)  +  /-(0):)+...~ 

«  — x(l  -O-O  co, )- x:(— 1  -tOj  -0-0  + oaf  )  =  x'(to;  -  to,2 ) 


(4  71 


The  above  derivation  completed  with  formula  (4.71)  allows  making  an  important 
conclusion  of  the  behavior  of  the  function  p(x)  near  x  =  0. 


limp(x)2  =  limx  (to2  -  to2 )  t4  7">) 

T— >0  t — >0  \  ^  t 

Consider  behavior  of  the  argument  of  the  Bessel  function  in  (4.57)  near  x  0: 


lim 

r— >0 


a(a  +  xa)y]\  -  p(x)2 ' 

=  lim 

r->0 

a(a  +  xa)ij\  -  x2| 

r-; — n 

l',r(<o;-ra;)  J 

/ 


(4  73) 


=  lim 

T->0 


a 


Vxx  (co;  -tof )  fijcoi  - o)“ ) 


aa 


V 


=  x 


Using  the  known  quality  of  the  modified  Bessel  function  of  the  first  kind: 


lim  I0(.v)  = 


V2 


rexp(.v) 


TLX 


(4  74) 


This  allows  substituting  the  Bessel  function  with  its  approximation  in  the 
formula  (4.57): 


f{a,a)  =  lim 

T— ■►() 


/ 

T a (a + xa) 

/?(T)VC 

,  y';pw  , 

^2%a{a  +  Ta)^\-  p(x)2 

exp 


/  *22 

a  I 

2Vxp{xf 


exp 


a  (a  +  ax) 

Kp(t)2 


\ 


exp 


a(a  +  x  a 


)t]\-/Xx): 


(4  75) 


Vxpix) 


JJ 


Substitution  of  the  approximation  (4.71)  for  the  function  /;(x)  into  the  equation  (4.75) 
allows  completing  the  evaluation  of  the  limit: 
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(4.76) 


( 


f(a,a)  =  —  exp 


2  \ 
a 

1 

o  y  r\ 

(  *2  \ 
a 

2KJ 

2r,(a>;-<u?)J 

The  structure  of  formula  (4.76)  reveals  independence  of  the  envelope  and  its 
derivative  (see  formula  (4.17)).  This  is,  actually,  an  expected  result.  Since  the  process  x  is 
stationary,  its  envelope  also  can  be  expected  to  be  stationary;  and  the  stationary  process  is 
independent  of  its  derivative. 


f(a,a)  =  f{a)f(a) 

Finally  the  distribution  of  the  derivative  is  expressed  as: 


(4.77) 


f(a)  = 


V2^M-cor)CXPl  2rv(co;  -  cof ) 


(4.78) 


It  is  a  normal  distribution  with  zero  mean  and  the  following  variance: 


(4-79) 

4.2.4.  Numerical  Example 

The  envelope  is  a  stochastic  process;  therefore  it  makes  sense  to  start  its 
numerical  exploration  by  comparing  its  statistical  estimate  of  the  autocorrelation  function 
w  ith  its  theoretical  counterpart  (4.42). 

The  values  of  the  envelope  were  computed  for  each  time  step  with  formula  (4.4). 
The  estimate  of  its  mean  value  is  expressed  as: 


Calculation  of  the  estimate  of  the  autocorrelation  function  may  encounter 
significant  difficulties  for  larger  values  of  time  due  to  insufficient  data.  Averaging  of  the 
estimate  over  the  records  alleviates  this  problem  (Belenky  et  a!  2007): 


**  = 


1 


A;-A 


V  Zt — :Z("< 

nr^n-j  tr 

i  =  UV;  j  =  l-NR- 


-  ~  m'a) 

k  =  1,.JV  - 1 


(4.81) 


Figure  4.5  shows  theoretical  autocorrelation  function  of  the  envelope  (4.42)  and 
its  statistical  estimate  (4.81).  Although  in-depth  statistical  analysis  was  not  performed,  it 
is  clear  from  this  figure  that  both  the  shape  of  the  autocorrelation  function  and  its  time  of 
decay  are  fairly  close. 
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Figure  4.5  Normali/ed  autocorrelation  functions  of  the  envelope 

As  an  ultimate  purpose  of  using  the  envelope  is  upcrossing,  it  makes  sense  to 
check  the  distribution  of  the  envelope  and  its  derivative.  This  can  be  done  using  Pearson 
chi-squarc  goodncss-of-fit  test.  However,  in  order  to  use  this  test,  all  the  points  included 
in  the  sample  must  represent  independent  data.  The  points  a,  are  dependent,  as  the 
process  of  the  envelope  has  a  certain  memory  represented  by  the  autocorrelation  function 
shown  in  Figure  4.5.  To  provide  the  goodness-of-fit  test  with  independent  data,  Belenky 
el  cil  (2007)  used  a  skipping  procedure,  with  the  time  interval  sufficient  for  the 
autocorrelation  function  to  decay.  In  this  ease  it  may  be  about  30  seconds,  which 
corresponds  to  60  steps.  So  only  one  point  over  60  steps  is  included  in  the  sample. 

Figure  4.6  and  Figure  4.7  show  the  distributions  of  the  envelope  and  its 
derivative,  respectively.  The  results  of  Pearson  chi-square  goodness-of-fit  test  were 
included  in  these  figures.  As  it  can  be  seen  the  test  was  passed  in  both  cases. 

The  goodness-of-fit  test  does  not  reject  the  theoretical  distributions.  It  means  that 
the  theory  of  envelope  w'as  correctly  interpreted  and  applied  in  this  case. 


Figure  4.6  Distribution  of  the  envelope 
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4.3.  Application  of  the  Theory  of  Upcrossing  to  the  Envelope 

To  obtain  the  theoretical  upcrossing  rate,  distributions  of  the  envelope  and  its 
derivative  need  to  be  substituted  into  the  general  formula  for  the  upcrossing  rate  of  a 
stationary  process: 

x 

K  =  f(a  =  l>)jaf(a)da  = 

0 


Here,  b  is  the  level  of  crossing. 

Evaluation  of  the  statistical  estimate  of  the  upcrossing  rate  does  not  differ  from 
the  procedure  described  in  Section  1.  Numerical  results  are  shown  in  Figure  4.8,  the 
theoretical  value  and  statistical  estimates  agree  as  the  theoretical  value  is  inside  of  the 
confidence  interval. 
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Figure  4.8  Theoretical  and  statistical  rate  of  uperossing  of  the  en\ elope.  Level  of  crossing  />= 9  m, 

total  number  of  upcrossing  is  302 
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Following  the  method  developed  in  section  1,  applicability  of  Poisson  flow  has 
been  tested  for  9  m  crossing  level,  see  Figure  4.9.  As  seen  from  this  figure,  the 
observation  of  uperossing  of  the  envelope  does  not  reject  Poisson  distribution. 
Calculations  for  different  levels  are  summarized  in  Table  7.  As  seen  from  this  table,  the 
uperossing  of  the  envelope  stops  following  the  Poisson  flow  somewhere  between  the 
levels  of  7  and  7.5  m.  This  is  actually  higher  than  the  process  itself.  As  it  was  shown  in 
the  seetion  1,  the  Poisson  How  lost  applicability  between  the  levels  5.25  and  5  in. 


0.8 


OOO  Theoretical  mass  probability  function  x:  1.620  d  3  0.6 >3 

OOQ  Based  on  Average  Number  per  Unit  of  Time  y2~  1 .63  d  2  P{y\J)  0.443 
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Crossing  level  0  m 
Total  302  upcrssings 

Number  of  time  windows  per  record  VM  5 
Duration  of  time  window  7V=360  s 
Volume  of  sample  1000 
estimate  of  mean  value  mk  0.302 
Estimate  of  variance  201 1 

Ratio  w//  rf=1.0375 


Figure  4.0.  Probability  mass  function  of  number  of  uperossing  of  the  envelope  during  time  window 


Obviously  using  the  envelope  in  this  ease  does  not  help  with  application  of 
Poisson  flow  for  the  lower  levels.  However,  the  numerical  example  used  so  far 
considered  waves  derived  from  a  Bretshneider  speetrum.  It  is  a  model  for  fully  dev  eloped 
waves  in  unrestricted  waters  and  the  speetrum  is  not  exaetly  narrow.  This  situation 
ehanges  completely  when  the  eneounter  speetrum  in  following  or  stem-quartering  waves 
is  considered.  This  is  the  eontent  of  the  next  subsection. 
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Table  7.  Evaluation  of  applicability  of  Poisson  flow  for  the  upcrossing  by  the  en\ elope 


Level,  m 

Number  of 
crossings 

iVM, 

mi/Vk 

Nmax 

Pearson  chi-square  test  for  Poisson  distribution  based  on 

Formula  (83) 

Averaged  number  of  crossing 

x2 

d 

P(x\d) 

•> 

r 

d 

p(rM) 

11 

31 

1 

0.958 

3 

0.8147 

2 

0.665 

0.5251 

1 

0.469 

10 

108 

1 

1.1 

4 

4.210 

3 

02396 

3.239 

2 

0.198 

9 

302 

5 

1.0375 

4 

1.629 

3 

0.653 

1.63 

2 

0.443 

8 

787 

10 

1.047 

4 

1.6636 

3 

0.645 

1.382 

2 

0.501 

7.5 

1224 

25 

1.0111 

4 

0.6589 

3 

0.8828 

0  301 

2 

0.8605 

7 

1799 

25 

1.0683 

4 

9.4395 

3 

0.024 

9.0013 

0.01  1 1 

6.75 

2210 

25 

1.1008 

4 

15.4754 

3 

0.0015 

15.8483 

2 

0.0004 

6.5 

2632 

25 

1.135 

4 

25.1708 

3 

1.42E-5 

25.9098 

2 

2.36E-5 

5 

6121 

40 

1.5325 

4 

554.98 

3 

0 

561.14 

2 

0 

4.4.  Effect  of  Speed  and  Wave  Direction 

4. 4. 1.  Encounter  Spectrum  of  H  aves 

The  wave  excitation  acting  on  a  ship  depends  on  speed  and  wave  direction,  even 
if  the  consideration  is  limited  by  Fourde-Krylov  forces  and  moments.  The  effect  is  caused 
by  the  relative  motion  of  the  wave  and  the  ship.  It  is  a  particular  case  of  the  Doppler 
effect,  when  the  frequency  is  increasing  when  the  source  of  a  signal  and  a  recipient  of  a 
signal  move  towards  each  other  and  decreasing  of  frequency  when  they  move  away  from 
each  other.  This  effect  is  known  in  Naval  Architecture  under  the  term  of  encounter 
spectrum  that  becomes  wider  in  the  head  and  oblique  waves  and  narrower  in  the 
following  and  stem  quartering  seas. 

The  calculation  of  the  encounter  spectrum  and  its  effect  on  ship  motions  are 
described  in  details  by  Kobylinski  and  Kastner  (2003).  Calculation  of  the  encounter 
spectral  density  se  can  be  carried  out  using  the  series  of  formulae  below: 

2 

(o.  =  co--—  Vs  cos  P  (4.83) 

g 

Here  (t><,  is  the  frequency  of  encounter,  (o  is  the  true  wave  frequency  (3  is  the  wave 
heading  angle  and  F^the  speed  of  the  ship  (in  m/s,  if  S.I.  is  used). 


140 


-v,(w.)  = 
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With  the  following  expression  for  the  parameters: 
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Figure  4.10  shows  the  encounter  spectrum  calculation  for  the  numerical  example 
(see  subsection  1.2.3)  calculated  with  formulae  ((4.83)-(4.86))  for  pure  following  waves 
(p=0)  and  speed  of  15  knots.  The  dramatic  effect  of  speed  and  wave  direction  is  very 
vivid. 

These  calculations  are  much  simpler  for  the  case  when  a  spectrum  is  already 
presented  in  amplitude  and  the  frequency  of  components;  the  new  set  of  frequencies 
consists  of  absolute  values  of  the  encounter  frequencies  (4.83): 

,Vw 

•'■(0-X,'»'cos(|to.j/  +  (P.)  (4  87) 

i=i 
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Figure  4.10  Encounter  (red)  and  true  (blue)  spectra  of  wave  for  pure  following  waves  (P=0)  and 

speed  of  15  knots 


4.4.2.  Time  History  and  the  Envelope 

The  time  history  of  the  19th  record  is  shown  in  Figure  4.1 1.  The  upper  part  (a)  of 
the  figure  shows  the  original  process  “recorded”  by  a  fixed  “gauge”.  The  lower  part  (b) 
is  “recorded”  by  a  “gauge”  moving  in  pure  following  waves  (p=0°)  with  a  speed  of  15 
knots.  There  is  a  significant  visual  difference  between  these  two  time  histories.  The  effect 
of  speed  and  direction  leads  to  appearance  of  groups  or  clusters.  These  clusters  may 
create  problems  with  Poisson  flow.  If  there  is  one  uperossing,  the  next  period  is  very 
likely  to  have  one  too.  This  breaks  the  requirement  of  the  independence  of  these  events  as 
the  autocorrelation  function  still  has  significant  numbers  after  one  period. 

At  the  same  time  the  autocorrelation  function  decays  at  a  significantly  slower 
pace:  compare  Figure  4.12  with  a  similar  figure  from  section  1.  The  autocorrelation 
function  in  Figure  4.12  keeps  some  values  even  at  the  end  of  the  record.  It  is  a  result  of 
the  “moving”  gauge;  there  may  be  a  component  that  moves  with  celerity  very  close  to  the 
“gauge”,.  Itmay  take  a  very  long  time  (up  to  eternity)  for  such  a  component  to  pass  the 
“gauge”,  and  therefore  its  influence  can  be  felt  for  such  long  time.  As  a  result,  it  is  not 
obvious,  how  long  it  takes  for  the  autocorrelation  to  die  out;  if  such  a  parameter  is  still 
required,  it  can  be  set  based  on  practical  consideration  such  as  “when  the  autocorrelation 
function  peaks  become  less  than  I0%”.  For  this  example  it  takes  400  seconds. 

Figure  4.13  show's  the  same  record  19  along  with  its  envelope  and  makes  visual 
yet  another  effect  of  speed  and  direction:  the  envelope  becomes  a  slowly  changing 
process  in  comparison  with  the  wave  elevations  even  “recorded”  by  the  moving  “gauge”. 
This  effect  is  actually  expected,  as  the  spectrum  becomes  narrow-  banded. 
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Figure  4.1 1  Time  liistor)  of  the  record  10  of  the  numerical  example  \\a\e  for  zero  speed  (a)  and  for 
pure  following  waves  (P=0)  and  speed  of  15  knots  (b) 
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Figure  4.12  Ensemble-averaged  normalized  autocorrelation  function,  esaluated  for  the  entire  length 
of  a  record  (a)  and  zoomed  out  in  the  first  200  seconds  (b)  for  the  process  of  wave  elevations  recorded 
from  a  “gauge’1  mo\ing  in  following  seas  with  the  speed  of  15  knots 
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Figure  4.13  Time  history  of  the  record  19  of  the  numerical  example  wave  for  pure  following  waves 
(p=0)  and  speed  of  15  knots  with  the  envelope.  The  zoomed  in  fragment  shows  how  the  envelope 

becomes  slowly  changing  process. 


4.4.3.  Applicability  of  Poisson  Flow 

The  most  dramatic  effect  the  speed  and  heading  is  on  the  applicability  of  Poisson 
flow.  As  it  was  noted  above,  string  clustering  of  periods  leads  to  similar  clustering  of 
uperossings  that  violates  the  independence  requirement  and  renders  Poisson  flow 
inapplicable.  Figure  4.14  shows  distribution  of  the  time  interv  al  between  the  uperossings. 
As  expected  none  of  the  hypotheses  is  supported  by  the  data.  The  histogram  docs  not 
resemble  exponential  distribution  at  all.  The  first  bin  is  much  taller  than  the  other  bins, 
showing  that  the  distribution  is  dominated  by  the  time  interval  close  to  the  mean 
encounter  period  (22  s).  These  calculations  were  done  for  the  crossing  level  of  7.5  m 
where  without  the  influence  of  speed  and  heading,  the  applicability  of  the  Poisson  flow 
did  not  raise  any  doubts,  see  Figure  1 .2 1 . 

The  direct  test  of  the  applicability  of  Poisson  flow  (Figure  4.15)  also  rejects  the 
hypothesis.  This  distribution  is  also  dominated  by  the  first  bin,  corresponding  to  zero 
number  of  uperossing  during  the  time  w  indow.  The  height  of  the  second  bin  is  almost 
equal  to  the  third  and  the  fourth.  It  means  that  the  number  of  cases  wTien  one,  two,  or 
three  crossings  are  almost  equal.  In  the  case  of  Poisson  distribution  they  arc  expected  to 
decrease  with  the  number  of  crossings. 
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Figure  4.14  Distribution  of  time  intervals  between  uperossing  for  pure  following  waves  speed  15 

knots,  level  of  crossing  7.5  m 


These  calculations  were  carried  out  systematically  for  the  crossing  level  ranging 
from  1 1  m  down  to  5.5  m  and  compared  with  the  similar  calculation  for  the  envelope. 
The  results  are  summarized  in  Table  8.  There  was  no  level  where  the  Poisson  flow  could 
be  applied  to  the  process  of  wave  elevations  recorded  from  a  "gauge’'  moving  tn  the 
following  seas  with  the  speed  of  1 5  knots. 


At  the  same  time,  applicability  of  the  Poisson  flow  to  the  envelope  is  easily 
achievable  and  it  is  good  all  the  way  until  the  level  between  6.25  and  6.5  in.  This  is  very 
close  to  the  result  obtained  in  section  1  for  the  wave  elevation  "recorded”  by  a  fixed 
“gauge”.  Therefore  the  envelope  can  be  used  for  detecting  uperossing  events  when  the 
spectrum  is  narrow  and  Poisson  flow  cannot  be  applied  directly  to  the  process. 


Theoretical  mass  probability  function  x2=532 1  d  6  P(x:.<6  0 
Based  on  Average  Number  per  Unit  of  Time  x  6670  d  5  a/)  0 

Based  on  Average  Time  between  Crossings  x'-642  d  5  P(x‘.<6  0. 

Based  on  Average  Censored  Time  before  1st  Crossing  x  =4007  d  5  P(x\<f)  0. 
Based  on  Average  Uncensorcd  Time  before  1st  Crossing  x"  l  I  384  d  5  P(y\if)  0 

Crossing  level  7.5  m 

Total  420,  147  reeords  had  at  least  one  crossings 
Number  of  time  w  indows  per  reeord  AC- 10 
Duration  of  time  window  7*=180  s 
Volume  of  sample  2000 
Estimate  of  mean  value  mk*= 0.2 1 
Estimate  of  variance  \\  =0.4041 
Ratio  ni)*i  l\  =0.5 197 
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Figure  4.15  Probability  mass  function  of  number  of  uperossing  during  time  window  for  pure 

following  waves  speed  15  knots 
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Table  8.  Applicability  of  Poisson  flow  for  the  case  of  following  yyaves  with  15  knots:  the  process  vs.  its 
_ _ envelope  _ _ 
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4.5.  The  Envelope  Based  on  Peaks 

4.5.  /.  Appearance  and  Distribution  for  Zero-Speed  Case 

As  it  was  shown  earlier  in  this  section,  the  peaks  of  the  envelope  do  not 
necessarily  correspond  to  the  peak  of  the  real  process.  It  can  be  seen  in  Figure  4.2  and 
Figure  4.3.  Thus  the  effect  is  much  less  pronounced  when  the  spectrum  is  narrow  like  in 
the  case  of  waves  “recorded”  by  a  “gauge”  moving  in  the  same  direction  (pure  following 
waves)  with  the  speed  of  1 5  knots,  see  Figure  4. 1 3. 

Consider  an  approximation  for  the  envelope  that  does  not  produce  artificial  peaks; 
each  uperossing  of  the  level  then  must  correspond  to  at  least  on  uperossing  by  the  process 
itself.  Such  an  approximation  can  be  achieved  by  a  piecewise  linear  function  using 
absolute  values  of  peaks  of  the  process  as  nodes.  Figure  4.16  shows  this  approximation 
along  with  the  true  envelope,  with  a  zoomed  in  picture  shown  in  Figure  4. 1 7.  As  it  can  be 
seen  from  Figure  4.17,  the  true  envelope  oscillates  about  piecewise  linear  approximation. 


Figure  4.16  Peak-based  or  piece  linear  approximation  of  the  envelope;  shown  for  the  record  #  19; 
true  envelope  is  shown  with  the  dotted  line.  The  wave  is  “recorder”  from  a  fixed  point. 
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Figure  4.17.  Zoomed  in  peak-based  or  piece  linear  approvimation  of  the  envelope;  shown  for  the 
record  U  19;  true  envelope  is  shown  with  the  dashed  line.  The  wave  is  “recorder”  from  a  fixed  point 

Distribution  of  the  peak-based  envelope  is  shown  in  Figure  4.18.  The  current 
value  of  the  peak-based  envelope  is  calculated  linearly  between  the  nodes.  Comparing  the 
histogram  with  the  theoretical  Rayleigh  distribution,  one  can  find  visual  similarity. 
However,  the  Pearson  chi-square  goodness  of  fit  test  does  not  support  this  hypothesis. 
Numerical  disruptions  caused  by  linear  interpolation  seem  to  be  statistically  significant  in 
this  case. 


Figure  4.18  Distribution  of  the  peak-based  envelope  for  the  /ero-speed  case.  Skip  30  seconds 


Calculation  of  the  distribution  of  the  derivatives  was  carried  out  as  follows.  The 
cubic  spline  with  free  ends  was  fit  to  run  through  the  peaks  of  the  process 
(Forsythe,  et  al  1977): 
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y = y, + bt  (x  ~xi)+ ci  (x  -  xi  y + d,  (x  -  xi  y 


(4.88) 


Here  (jc,  yt)  are  coordinates  of  the  nodes,  /?„  c,,  d,  are  spline  coefficients.  Onee  the  spline 
was  fitted,  the  derivative  can  be  expressed  as: 

v'  =  bi  +  2  cj  (.x  -  xj )  +  3  dt  (x  -  x, )'  (4.89) 

The  values  of  the  derivative  in  each  node  simply  are: 

y\=b,  (4.90) 

The  values  of  the  derivative  outside  of  the  nodes  were  calculated  with  linear 
interpolation  at  each  time  step.  The  distribution  of  the  derivatives  of  the  peak-based 
envelope  is  shown  in  Figure  4.19.  Theoretical  distribution  (4.78)  is  not  supported  by  the 
Pearson  ehi-square  goodness  of  fit  test.  Visually,  however,  the  distribution  seems  to  be 
normal  but  would  be  characterized  by  significantly  less  variance. 


Figure  4.19  Distribution  of  the  derivatives  peak-based  envelope  for  the  zero-speed  case.  Skip  30 

seconds 

Piecewise  linear  approximation  of  the  envelope  further  refereed  as  a  peak-based 
envelope  allows  us  to  avoid  artificial  peaks  that  eould  he  found  in  the  true  envelope. 
Every  peak  of  this  approximated  envelope  corresponds  to  the  uperossing  of  the  level  by 
the  original  proeess.  However,  in  the  ease  of  zero-speed,  the  numerical  disruptions 
introduced  by  the  approximation  lead  to  a  deviation  of  the  distribution  of  the  peak-based 
envelope  and  its  derivative  from  the  theoretical  PDFs. 

4.5.2.  Appearance  and  Distribution  the  Peak-Based  Envelope  for  Narrow 
Spectrum 

For  the  proeess  of  encounter  waves  described  by  the  speetrum  in  Figure  4. 10,  the 
appearance  of  the  peak-based  envelope  is  shown  in  Figure  4.20.  The  peak-base  envelope 
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becomes  visually  indistinguishable  from  the  true  envelope.  The  zoomed-in  image  in 
Figure  4.21  still  shows  a  very  small  difference  between  the  two  envelopes.  Actually  this 
is  an  expected  result.  Once  the  spectrum  is  narrow,  the  envelope  becomes  a  slowly 
changing  curve  in  comparison  with  the  original  process.  Curvature  of  the  true  envelope 
decreases,  therefore  accuracy  of  its  approximation  with  the  broken  line  increases.  Figure 
4.22  and  Figure  4.23  show  the  distribution  of  the  values  of  the  peak-based  envelope  and 
its  derivative.  Both  figures  were  calculated  in  the  same  way  as  in  the  previous  case  w'ith 
zero-speed.  Skip  time  was  140  seconds  as  the  autocorrelation  in  following  waves  dies  out 
slower  (see  Figure  4.12).  As  it  could  be  expected  both  histogram  support  theoretical 
distributions. 

As  it  has  been  seen  from  the  above  considerations  the  peak-based  envelope 
represents  a  much  better  approximation  for  the  case  of  the  narrow'  spectrum  in 
comparison  with  the  case  of  zero-speed. 


Figure  4.20  Peak-based  or  piece  linear  approximation  of  the  envelope;  shown  for  the  record  #  19; 
true  envelope  is  shown  with  the  dotted  line.  The  wave  is  “recorded"  by  the  “gauge"  moving  with  the 
waves  (pure  following  seas)  with  the  speed  15  knots 


Figure  4.21  Zoomed  in  peak-based  or  piece  linear  approximation  of  the  envelope;  shown  for  the 
record  #  19;  true  envelope  is  shown  with  the  dashed  line.  The  wave  is  “recorded"  by  the  “gauge" 
moving  with  the  waves  (pure  following  seas)  with  the  speed  15  knots 
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Figure  4.22  Distribution  of  the  peak-based  envelope  for  the  following  wave  case  and  speed  of  15 

knots;  skip  140  seconds 


Figure  4.23  Distribution  of  the  derivative  of  the  peak-based  envelope  for  the  following  wave  case  and 

speed  of  15  knots;  skip  140  seconds 


4.5.3.  Upcrossings  of  Peak-Based  Envelope 

Statistical  estimates  for  the  rate  of  upcrossings  of  the  peak-based  envelope  are 
shown  in  Figure  4.24  for  both  zero-speed  (a)  and  the  following  wave  cases  (b).  The 
confidence  interval  for  the  estimates  was  evaluated  assuming  binomial  distribution  (sec 
subsection  1 .2.2).  For  the  zero-speed  case,  the  estimate  of  the  uperossing  rate  does  not 
contain  theoretical  values  inside  the  confidence  interval.  Similar  to  the  results  with 
distributions  (see  Figure  4.18  and  Figure  4.19),  numerical  disturbances  introduced  by  the 
linear  interpolation  caused  the  observed  difference.  Following  the  same  tendency, 
uperossing  rates  of  the  true  envelope  and  peak-based  envelope  for  the  following  wave, 
15-knots  case  are  statistically  identical. 
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Figure  4.24  Theoretical  and  statistical  rate  of  upcrossings  of  the  true  and  peak-based  envelopes. 
Level  of  crossing  />=7.5  m,  zero  speed  case  (a);  following  waves  with  speed  15  knots  (b) 
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Significant  statistical  difference  between  the  theoretical  lipcrossing  rate  of  the  true 
envelope  and  statistical  estimate  of  the  uperossing  rate  of  the  peak-based  envelope  also 
reflect  the  statistical  significance  of  the  artificial  peaks  of  a  true  envelope. 

Nevertheless,  these  numerical  disturbances  did  not  much  affect  the  Poisson 
character  of  upcrossings.  The  distribution  of  the  number  of  upcrossings  during  a  given 
time  window  remains  Poisson  for  both  considered  eases  of  uperossing  of  the  peak-based 
envelope,  see  Figure  4.25. 

Table  9  contains  results  of  calculations  for  systematically  changing  the  crossing 
level  to  see  where  applicability  of  Poisson  flow  breaks  down  for  the  peak-based 
envelope.  These  calculations  were  carried  out  for  both  cases:  zero-speed  and  following 
waves  /  15  knots  (narrow  spectrum).  Histograms  were  compared  with  the  probability 
mass  function  ealeulated  with  the  statistical  rate  of  uperossing,  as  the  theoretical 
distribution  is  no  longer  applicable  for  the  zero-speed  case.  Using  the  level  of 
significance  of  0.05,  one  ean  see  that  the  boundary  of  applicability  of  Poisson  flow  lays 
somewhere  between  7  and  7.5  m  for  the  zero-speed  ease  and  between  6  and  6.25  nt  for 
the  following  wave  ease.  These  number  are  essentially  are  the  same  for  the  true  envelope, 
see  Table  7  for  the  zero-speed  ease  and  Table  8  for  following  wave  case.  Thick  lines  are 
used  in  these  tables  to  show  the  boundary  applicability. 

Comparing  the  number  of  upcrossings  for  the  true  and  peak-based  envelope,  the 
significant  difference  existing  for  the  zero-speed  ease  could  be  expected,  as  numerical 
discrepancies  were  significant  enough  to  ehange  the  distribution.  The  difference  in  the 
number  of  uperossings  for  the  following  sea  ease  is  mueh  less,  but  the  numbers  are  not 
identical,  despite  that  there  were  no  visual  differences  in  the  appearanee  of  peak-based 
and  true  envelope  for  the  following  sea  ease. 
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Total  1019  crossings 

Number  of  time  windows  per  record  ;VM=8 

Duration  of  time  window  Tk~ 225  s 

Volume  of  sample  1600 

Estimate  of  mean  value  ntk  =0.6369 

Estimate  of  variance  F/  0.6029 

Ratio  mk V  Vk=  1 .0564 

X:=1.87  d=3  P(x\cf)  0.6 


Total  271  crossings 

Number  of  time  u  indows  per  record  Nw= 2 

Duration  of  time  window  Tk- 900  s 

Volume  of  sample  400 

Estimate  of  mean  value  mk  =0.6775 

Estimate  of  variance  Vk  =0.6775 

Ratio  mk(  Vk *=1.0753 

X2=l -76  d=3  P(x\d)=  0.625 


Figure  4.25  Probability  mass  function  of  number  of  uperossing  during  time  window  of  the  peak- 
based  envelope  for  zero  speed  case  (a)  and  pure  following  waves  speed  15  knots  (b);  crossing  level 

7.5  m 


Table  9.  Applicability  of  Poisson  flow  for  uperossing  of  peak-based  envelope 
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4.6.Summary 

Envelope  theory  describes  the  presentation  of  a  stationary  stochastic  process  via 
two  other  stationary  stochastic  processes:  the  amplitude  and  phase  of  the  envelope.  Most 
of  the  theoretical  results  of  envelope  theory  are  applicable  only  for  a  normal  process; 
however  some  principles  of  envelope  theory  can  be  used  for  the  stationary  process  with 
any  type  of  distribution. 
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The  envelope  contains  the  absolute  values  of  all  of  the  peaks  of  the  original 
process.  The  peaks  of  the  envelope,  however,  do  not  necessarily  belong  to  the  original 
process  (artificial  peaks). 

A  portion  of  envelope  theory  was  reviewed  here,  including  the  marginal 
distributions  of  amplitude  (Rayleigh)  and  phase  (uniform  distribution),  the 
autocorrelation  function  of  the  envelope,  and  the  distribution  of  the  derivative  of  the 
envelope  (normal).  Numerical  examples  demonstrated  successful  reproduction  of  the 
theoretical  results. 

Treating  the  envelope  as  a  stationary  process  allows  the  application  of  the 
uperossing  theory.  It  is  possible  to  obtain  the  elosed-form  solution  for  the  uperossmg  rate 
of  the  envelope  if  the  original  process  is  normal.  This  result  was  verified  numerically. 

Upcrossings  of  the  envelope  follow  Poisson  flow,  if  the  crossing  level  is  high 
enough  and  upcrossings  can  be  treated  as  independent  random  events. 

The  spectral  bandwidth  of  the  process  has  a  significant  influence  on  the  envelope. 
If  the  spectrum  is  narrow,  the  envelope  becomes  a  slowly  changing  function  in 
comparison  with  the  original  process.  It  was  demonstrated  with  another  numerical 
example  of  encountered  waves;  the  w  ave  elevations  were  v  irtually  “recorded"  by  a  “wave 
probe"  moving  in  the  same  direction  as  the  waves  (pure  following  seas)  with  a  speed  of 
1 5  knots. 

It  was  shown  that  for  the  encountered  waves  Poisson  flow  is  no  longer  applicable. 
Due  to  significant  clustering  or  grouping,  all  of  the  upcrossings  become  dependent  on 
each  other. 

A  piecewise  linear  approximation  of  the  envelope  was  considered  (the  peak 
based  envelope).  The  approximation  contains  only  actual  peaks  of  the  process  and  all 
other  points  are  calculated  by  linear  interpolation  between  peaks. 

Numerical  discrepancies  introduced  by  such  approximations  lead  to  the 
inapplicability  of  the  theoretical  solution  for  the  distribution  and  uperossing  rates  for  the 
envelope  in  the  original  wave  example  (zero-speed  ease).  However,  all  theoretical  results 
were  applicable  for  the  following  wave,  1 5-knots  case. 

The  applicability  of  Poison  flow  was  not  affected  by  the  approximation. 
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5.  Envelope  Peaks  over  the  Threshold 

This  section  describes  a  method  of  statistical  extrapolation  using  probabilistic 
properties  of  the  peaks  of  the  envelope  exceeding  a  given  threshold. 

5.1.  Both-Sides  Crossings 

5.  /.  /.  Large  Roll  Event  as  Both-Sides  Crossing 

Partial  stability  failure  in  the  form  of  large  roll  event  is  equally  dangerous  on 
either  side  of  a  ship.  Therefore,  a  random  event  of  uperossing  is  not  yet  a  complete  model 
of  partial  stability  failure.  A  complete  model  of  the  partial  stability  failure  should  include 
both  uperossing  of  a  specified  level  on  the  positive  side  and  downerossing  of  the 
specified  level  on  the  negative  side.  This  random  event  can  be  written  as: 

A'  =  ((())(/)  <  a) f|  (<)>(/  +  <//)  >  </))U  ((<))(/)  >  />)fi  +  dt)<  b ))  (5. 1 ) 

Here,  X  is  a  random  event  associated  with  partial  stability  failure;  a  is  a  positive  level  of 
exceedance  and  h  is  negative  level  of  exceedance.  Obviously,  if  the  mean  value  of  roll  is 
zero  then  requirements  are  the  same  for  the  both  sides: 

if(m($)  =  0)=>a  =  -b  (5.2) 

Consider  a  probability  of  both-sides  crossing  in  a  particular  instant  of  time  /.  As  it 
is  known  from  uperossing  theory  (and  has  been  demonstrated  earlier  in  this  report)  this 
probability  is  infinitely  small: 

dP(X)  =  />(((<K0  <  « )fl  (<)>(/  +dt)>  o))U ((<)>(/)  >  h)n (<)>(/  +  dt)  <  b)))  (5.3) 

The  roll  proeess  is  single-valued  (has  only  one  value  at  the  same  instant  of  time), 
therefore  it  cannot  eross  two  distinet  levels  simultaneously  and  the  events  of  uperossing 
of  the  level  a  and  downerossing  the  level  b  at  the  time  instant  t  are  incompatible: 

/5(((<1)(/)  <  tf)n  (<)>(/  +  dt)>  </))n((4K0  >  b)f){W  +  dt)  <  />)))=  0  (5.4) 

Therefore: 

dP{ X)  =  P({( KO  <  a)fl (<|)(/  +  dt)  > a))+  />((<j)(n  >  +  dt)<b))  (5.5) 

Since  dt  is  an  infinitely  small  increment,  equation  (5.5)  can  be  presented  as  (value  at  the 
instant  t  is  assumed  and  the  symbol  (/)  is  dropped): 
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j/3(A')=/3(((()<a)n(<t>  +  <j >t//  >  4)+  p((§  >  /?)n  +<m^  ^  £>))= 
=  f(((|)  <  a)n  (<j)  >  «  -  <j )<jt ))+  p((<t)  >  />)fl 4  <  b -  (jk// )) 


(5.6) 


It  is  obvious  that  upcrossing  occurs  with  positive  roll  rate  and  the  downcrossing 
with  the  negative  roll  rate: 

dP(x) =/>((<!>  <a)n(<i>>  a-  <j**)n(<i»o))+ 

+ />((<()  >  b)n  4  <  b  -  f *)n  <  o))  (5'7) 

If  the  joint  distribution  of  roll  and  roll  rate  /(<(>,<)))  is  known,  the  probability  can  be 
expressed  as: 

a  x  /)  <jk//  0 

dP(X)  =  |  {/((|),<j>)44+  |  J/'(<(>,«i»)^<t)  (5.8) 

<;  0  b  x 

Both  external  integrals  in  the  equation  (5.8)  have  limits  that  are  infinitely  close  to 
each  other.  Then  application  of  the  mean  value  theorem  (for  integration)  yields  the 
following: 


1:  0 

<JP(X)  -  dt  J/(r/.<j))<j)4-^  J/(/>,<j>)<j)c/<j)  (5.9) 

0  — x 

Assuming  that  the  roll  motion  is  a  stationary  process  leads  to  independence  of  the 
process  value  and  its  first  derivatives: 


This  circumstance  allows  rewriting  the  equation  (5.9)  as: 

■\ 


(5.10) 


dP(X)  =  dt 


m  / 


-m 


(5.11) 


The  expression  in  parenthesis  is  finite  and  represents  a  rate  of  the  random  event  of 
both-sides  crossing: 


dP(X)-Xahdt 

x  0 

xah  =  /(«)  \f 444  -  f (b)  j/(<i44 


(5.12) 


Compared  with  formula  (1.19),  it  is  easy  to  see  that  the  first  component  is  actually 
a  rate  of  upcrossing  through  the  level  a.  It  can  be  shown  the  second  component  represents 
the  rate  of  downcrossings: 
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0 


(5.13) 


K=fifl)  J 

0 

K=-m  l 


The  rate  of  downcrossing  is  actually  always  positive  as  the  value  of  the  integral  is 
negative. 

If  the  distribution  of  the  roll  rate  is  symmetric,  the  integrals  in  (5.13)  have  the 
same  absolute  value,  but  different  signs: 

t  T. 

K  =  /(«)]/( =/(^)j/(i)W>  If  /(<i>)  =  /(-<i>)  (5  14) 

0  0 


Finally  if  the  roll  process  has  zero  mean,  its  distribution  is  symmetric  and  for 

b  =  -a : 


Xah  =  2XU  =  2Xh  =  2/ (a) j/(<j>)<j*/<j>  if  b  =  -a  f|  /(<)>)  =  /(  4>)  (5  1 5) 

0 


In  particular,  for  the  generie  normal  process  v(/): 


X 


itb 


2  \ 


2  V 


>  / 


(5  16) 


Statistical  estimates  of  the  both-sides  crossing  can  be  obtained  as  averaged 
number  of  crossing  of  both  sides  per  unit  of  time: 


,•  _mt  +mD 


(5  17) 


Here  m]  is  an  estimate  of  the  mean  value  of  the  number  of  uperossings  through  the  level 

a  and  m'D  is  an  estimate  of  the  mean  value  of  the  number  of  downerossings  through  the 
level  b .  Tk  is  the  duration  of  the  record. 

5. 1.2.  Confidence  Interval  for  Both-Sides  Crossings 

Following  the  same  steps  as  in  the  case  of  uperossings,  consider  a  sample  of 
stoehastie  process  .v,  presented  in  a  form  of  an  ensemble  of  Nr  records.  Each  reeord  is 
represented  by  a  time  history  of  NPT  points  with  the  time  step  A/,  totalling  n-  AVH  time 
steps.  Then  the  event  of  both-sides  uperossing  of  the  level  a  can  be  associated  with  a 
random  variable  D  defined  for  eaeh  time  step  as  follows  (see  Figure  5.1): 
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0  Otherwise 


i  1 , . M a?  5  j  - 1,...,  A  R 


(5.18) 


Du  = 


Total  number  of  both-sides  crossings: 

No-YLDi.j  (5-19) 

.=1  7=1 

Estimate  of  probability  that  a  both-sides  crossing  will  occur  at  any  given  instance 

of  time: 


Po  =■ 


Nr 


■=— YT/> 


hNr  n NR  n  M 

Mean  number  of  both-sides  crossing  per  record: 

M  1  "  Hi 


(5.20) 


(5.21) 


Estimate  of  rate  of  both-sides  crossing  also  can  be  expressed  through  the 
characteristics  of  an  auxiliary  variable: 


n&t  nNRAt~fj 


"  ‘ '  R 

S2X 


(5.22) 


Evaluation  of  the  confidence  interval  for  the  estimate  of  the  rate  of  both-sides 
crossing  encounters  certain  methodological  difficulties.  The  method  of  evaluation  of  the 
confidence  interval,  described  in  the  Section  1,  was  based  on  the  binomial  distribution  of 


the  auxiliary  variable.  The  binomial  distribution  assumes  Bernoulli  trials,  which  implies 
the  independence  of  the  eonseeutive  events.  If  the  level  is  high  enough,  the  uperossings 
are  independent  events;  as  a  result  of  that,  the  time  between  them  has  an  exponential 
distribution  and  the  number  of  these  uperossings  during  a  finite  duration  of  time  has 
Poisson  distribution.  Consecutive  both-sides  crossings  ean  be  as  close  to  each  other  as  a 
half  a  period.  This  is  not  enough  time  for  the  autocorrelation  function  to  die  out,  therefore 
the  neighbor  both-sides  crossings  may  be  dependent. 

On  the  other  hand,  if  time  is  fixed,  all  of  the  events  are  independent  as  the  records 
are  independent.  In  this  ease,  conditions  of  Bernoulli  trials  are  satisfied  and  the 
distribution  of  the  auxiliary  variable  is  binomial.  The  independence  of  the  event  in  the 
time  seetion,  however  does  not  necessarily  lead  to  the  exponential  distribution  of  time 
between  events  or  Poisson  distribution  of  a  number  of  events  during  fixed  time,  as  these 
figures  require  temporal  consideration  and  do  not  exist  in  the  time  seetion. 

Binomial  distribution  depends  on  the  probability  of  the  event  occurring  at  a 
particular  instant  of  time  estimated  by  formula  (5.20).  Averaging  over  the  time  seetion 
(averaging  over  all  the  records  at  the  given  instant  of  time)  and  temporal  averaging  both 
are  present  here.  This  allows  for  mitigating  possible  errors  in  the  evaluation  of  the 
eonfldenee  interval.  Further  evaluation  was  done  in  a  similar  wray  as  describes  in  the 
seetion  1 . 

Figure  5.2  shows  this  estimate  calculated  for  the  numerical  example.  The  crossing 
level  is  ±9  m;  as  the  proeess  used  in  the  example  has  zero  mean  and  nomial  distribution 
(wave  elevations),  formula  (5.16)  was  used  for  the  theoretical  value.  As  it  is  clearly  seen 
from  this  figure  the  theoretical  value  is  included  in  the  confidence  interval  of  the 
estimate.  Therefore,  the  derivations  above,  and  the  formula  (5.16)  in  particular,  are  not 
rejected  by  the  numerical  example. 
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Figure  5.2  Both-sides  crossing  rate:  theoretical  value  and  statistical  estimate  by  counting.  Crossing 

level  ±9  m 
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5. 1.3.  Applicability  of  Poisson  Flow  to  Both-Sides  Crossing 

Theoretical  derivation  and  statistical  estimation  of  rate  of  both-sides  crossing  was 
very  similar  to  uperossing.  However  the  random  event  of  both-sides  crossing  hardly  can 
be  expected  to  follow  the  Poisson  flow  because  crossing  of  one  side  may  be  too  close  in 
time  to  the  crossing  of  the  other  side  for  the  independence  condition  to  appy. 

This  can  be  very  clearly  seen  from  the  numerical  example  considered  through  and 
this  report.  As  it  was  shown  in  the  Section  1,  the  Poisson  flow  becomes  applicable  to  the 
uperossing  event  when  the  level  of  crossing  exceeded  a  value  between  5.25  and  5.5  m 
while  the  autocorrelation  function  could  be  considered  to  have  died  out  after  about  40-45 
seconds.  The  mean  time  between  crossings  was  somewhere  between  75  and  89  seconds; 
this  provided  enough  separation  between  events  to  consider  them  independent. 

For  the  both-sides  crossing,  time  between  crossing  one  side  or  another  can  be  as 
small  as  a  half  of  zero-crossing  period.  For  the  considered  example  it  is  only  about  5.8 
seconds,  which  is  obviously  not  enough  to  ensure  independence.  However,  it  is  possible 
that  applicability  of  Poisson  flow  may  be  preserved  for  the  very  high  levels,  where  a 
statistically  significant  number  of  cases  with  only  one-side  crossing  can  exist. 

To  check  the  hypothesis  above,  a  series  of  calculations  for  systematically  changed 
levels  was  performed  for  the  considered  numerical  example.  Using  the  method  described 
in  the  subsection  1.3.5  and  keeping  the  maximum  number  of  crossings  per  windows 
around  7-9  (if  possible),  the  boundary  of  applicability  was  determined  to  be  between  10 
and  10.5  m,  see  Tabic  10. 


Table  10.  Test  on  applicability  of  Poisson  flow  for  both-sides  crossings 


Level m 

Number  of 

crossings 

N 

v 

J  ’max 

Pearson  chi-square  test  for  Poisson  distribution  based  on 

Formula  (5.16) 

Averaged  number  of  crossings 

r 

d 

P(f.ct) 

•> 

X‘ 

d 

12 

3 

1 

1.01 

2 

0.9491 

1 

0.33 

0  0007 

0 

- 

11.5 

9 

1 

1.04 

3 

0.5928 

2 

0.74 

0212 

1 

0.6452 

11.0 

22 

1 

0.93 

3 

0.9024 

2 

0.64 

0.92 

1 

0.3359 

10.5 

54 

1 

0.85 

4 

7.0184 

3 

0.07 

4.38 

2 

0  1119 

10.0 

101 

1 

0.76 

5 

38  895 

4 

7.3E-8 

25.062 

3 

1.50E-5 

9.5 

179 

2 

0.77 

6 

51.8343 

5 

5.8E-10 

40.66 

4 

3.16E-8 

9.0 

307 

2 

0  73 

6 

30.142 

5 

1.38E-5 

28.99 

4 

7.85E-6 

8.5 

517 

2 

0  87 

7 

13  4984 

6 

0.0358 

13.6851 

5 

0.0177 

8.0 

860 

4 

0.75 

7 

65.106 

6 

4.10E-12 

67.867 

5 

2.85E-I3 

7.5 

1441 

12 

065 

7 

370.7 

6 

0 

391.1 

5 

0 

7 

2266 

18 

0.66 

8 

505.2 

7 

0 

542.1 

6 

0 

6.5 

3514 

36 

0,65 

7 

1011 

6 

0 

1068 

5 

0 

6.0 

5354 

36 

0.66 

7 

1015 

6 

0 

1031 

5 

0 

5.5 

7676 

36 

0.6969 

8 

900.0 

7 

0 

904.39 

6 

0 

5.0 

10852 

36 

0.734 

8 

875.86 

7 

0 

870  56 

6 

0 

4.5 

14860 

36 

0.8186 

9 

666.76 

8 

0 

665.69 

7 

0 

4.0 

19482 

36 

0.9532 

9 

496.39 

8 

0 

494.97 

7 

0 
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The  outcome  of  the  direct  Poisson  applicability  test  of  the  both-sides  crossings 
can  be  sensitive  to  the  windows  size.  Results  of  calculations  shown  in  Table  1 1  were 
obtained  with  the  specific  purpose  to  take  the  method  beyond  its  breaking  point.  A 
similar  procedure  was  carried  over  for  uperossing  in  Section  1;  there  it  was  found  that  the 
results  of  direct  applicability  test  were  not  sensitive  to  the  size  of  window.  Behavior  of 
the  both-sides  crossings  was  found  to  be  different  and  the  results  are  sensitive  to  windows 
size. 


At  all  the  levels  (with  exception  of  9.5  m),  the  applicability  of  Poisson  flow  was 
not  rejected  if  the  larger  windows  were  used.  Fractions  in  the  column  of  number  of 
windows  (marked  Nw)  mean  that  the  window  was  of  a  larger  duration  than  a  record.  For 
example  M.=l/4  means  that  four  records  makes  one  window,  while  Nw=  1.2  means  that 
each  window  uses  one  record  on  its  full  length  and  20%  of  length  of  the  next  record. 

Results  in  Table  1 1  cannot  be  explained  other  than  as  a  numerical  artifact. 
Properties  of  Poisson  flow  cannot  be  supported  if  the  events  are  not  independent.  It  was 
quite  clearly  seen  from  a  number  of  calculations  discussed  in  Section  I;  once  events  are 
too  close  to  each  other,  while  the  autocorrelation  functions  have  not  died  out  yet,  the 
hypothesis  of  Poisson  flow  was  clearly  rejected.  To  verify  that  results  in  Table  1 1  are,  in 
fact,  a  numerical  artifact,  another  method  of  testing  was  applied. 

Tabic  12  shows  the  results  of  a  Kolmogorov-Smimov  goodness-of-fit  te'.t  (k-S 
test)  applied  as  described  in  Section  I.  Its  application  is  completely  justified  if  a 
theoretical  distribution  is  used;  however  it  may  be  too  “optimistic"  on  the  statistical  fit,  as 
it  does  not  have  a  mechanism  to  apply  a  penalty  for  statistically  estimated  parameters  (see 
Section  1).  Results  in  Table  12,  however,  do  not  show  large  differences  in  judgment  on 
the  applicability  of  Poisson  flow  to  both-sides  crossings. 


Table  II.  lest  on  applicability  of  Poisson  flow  for  both-sides  crossings  (Increased  window  size) 


Level  m 

Number  of 
crossings 

,V„ 

Mk/Vk 

;^ma\ 

Pearson  chi-square  lest  for  Poisson  distribution  based  on 

Formula  (5.16) 

Averaged  number  of  crossings 

x: 

d 

x2 

d 

12 

3 

1/4 

1.0426 

2 

0.7626 

1 

0.38 

0.11 

- 

11.5 

9 

1/4 

0.94 

4 

0  6459 

3 

0.8859 

0.2322 

2 

0.8904 

11.0 

22 

1/4 

0.88 

4 

0.7307 

3 

0.866 

0.7651 

2 

0.6821 

10.5 

54 

1/4 

1 .0652 

4 

4.7329 

3 

0.1924 

1.7518 

■> 

0.4165 

10,0 

101 

1/4 

1.0206 

8 

10.4581 

7 

0.1641 

5.137 

6 

0.5264 

9.5 

179 

1 

0.82 

7 

24.615 

6 

0.0004 

16.963 

5 

0.0046 

9.0 

307 

1 

0.88 

8 

-7.4806 

7 

0.38 

6.9313 

6 

0.3272 

8.5 

517 

1 

1.1 

8 

2.3842 

7 

0.9356 

2.3616 

6 

0.8836 

8.0 

860 

I 

1.2 

10 

12  4697 

9 

0.2549 

11.5471 

8 

0.2401 

7.5 

1441 

1 

1.21 

16 

1 1 .0447 

15 

0.7494 

10.9491 

14 

0.69 

7 

2266 

12 

0.8675 

21 

26.3937 

20 

0.1532 

21.2171 

19 

0.3249 

6.5 

3514 

1.2 

0.9172 

27 

21.9401 

26 

0  692 

17  6174 

25 

0.9172 

6.0 

5354 

1.2 

1.0115 

37 

29.276 

36 

0.7788 

26.1945 

35 

0.8588 

5.5 

7676 

2 

0.9526 

33 

4097 

32 

0.1329 

34.0338 

31 

0.3236 

5.0 

10852 

2 

1.0136 

41 

28.7651 

40 

0.9068 

26.5423 

39 

0.9356 

4.5 

14860 

2 

1.1773 

54 

34.3761 

53 

0.9779 

34.4415 

52 

09712 

4.0 

19482 

3 

1.1015 

47 

18.68 

46 

0.999 

19.0462 

45 

0.998 
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Table  12.  Kolmogorov-Smirnov  test  on  applicability  of  Poisson  flow  for  both-sides  crossings 


Level  ±.  m 

Number  of 
crossings 

Formula  (5.16) 

Averaged  number  of  crossings 

Max 

difference 

Value  of 

Criterion 

Probability 

that  fit  is  good 

Max 

difference 

Value  of 

Criterion 

Probability 

that  fit  is  good 

12 

3 

0.0139 

0.0241 

1 

0.0057 

0  0099 

1 

11.5 

9 

0.0147 

0.0442 

[  1 

0.0067 

0.0202 

1 

11.0 

22 

0.0151 

0.0706 

1 

0.0108 

0.0507 

1 

10.5 

54 

0.0229 

0.1682 

1 

0.0216 

0.1589 

1 

10.0 

101 

0.0201 

0.2024 

1 

0.0503 

0.5053 

0.9604 

9.5 

179 

0.0558 

0  747 

0.6323 

0.0785 

1.0496 

0.2205 

9.0 

307 

0.0709 

1.2419 

0.0915 

0,0754 

1.3212 

0.0609 

8.5 

517 

0.0904 

2.0553 

0.0004 

0.0812 

1.8461 

0.0022 

8.0 

860 

0.1046 

3.0662 

1.3648E-8 

0.0878 

2.5759 

3.4488E-6 

7.5 

1441 

0. 1 1 87 

4  5071 

0 

0.109 

4.1374 

2.66E-15 

7 

2266 

0.1235 

5.881  1 

0 

0.1094 

5.2094 

0 

6.5 

3514 

0.1314 

7.7869 

0 

0.1193 

7.0746 

0 

6.0 

5354 

0.1302 

9.5234 

0 

0.1258 

9.2021 

0 

5.5 

7676 

0.1249 

10.9419 

0 

0.1174 

10.2896 

0 

5.0 

10852 

0.107 

11.1445 

0 

0.1036 

10.7952 

0 

4.5 

14860 

0.081 

9.8696 

0 

0.0804 

9.7988 

0 

4.0 

19482 

0.0579 

8.0867 

0 

0.0569 

7.9386 

0 

The  theoretical  distribution  (5.16)  and  statistical  fit  both  provide  applicability  of 
Poisson  flow  above  a  level  located  between  8.5  and  9  m.  This  is  somewhat  lower  than  the 
boundary  found  by  the  direct  test  of  applicability  of  Poisson  flow  in  Table  10,  where  it 
was  between  10  and  10.5  m.  Such  a  discrepancy,  however,  has  been  observed  when 
applying  both  tests  to  just  one-side  uperossing  in  Section  1. 

The  direct  test  of  the  applicability  of  Poisson  flow  seems  to  be  more  conservative 
than  the  K-S  test.  However,  in  the  case  of  both-sides  crossing,  it  may  give  the  wrong 
answ  er  if  the  size  of  the  window  is  too  large.  To  ensure  reliable  judgment  of  applicability 
of  Poisson  flow,  the  direct  test  needs  to  be  complemented  by  the  K-S  test. 

5.1.4.  Relation  between  Both-sides  Crossing  and  Absolute  Value  of  Peaks 

Consider  a  sample  of  stochastic  process  .v,  presented  in  a  form  of  an  ensemble  of 
Nr  records.  Each  record  is  represented  by  a  time  history  of  Npp  points  with  the  time  step 
At.  totaling  n=  Npj-\  time  steps.  Then  the  event  of  occurrence  of  a  peak,  exceeding  a 
given  threshold  or  the  level  a,  is  associated  with  an  auxiliary  random  variable  Z  defined 
for  each  time  step  as  follows  (see  Figure  5.3): 
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(5.23) 


Z.  = 


-v,.,  >an\xLJ  |>| -v,  K/  |n|.v,_;  |>|.v,tK; 


1 0  Otherwize 
i  j  =  1 . Nr 


Figure  5.3  Auxiliary  random  variable  for  peak  over  the  threshold 


This  random  variable  Z  is  defined  analogously  to  the  auxiliary  random  variable  D. 
see  the  previous  subsection.  Following  the  same  logic,  the  total  number  of  all  crossings  is 
just  a  sum  of  the  values  of  the  auxiliary  variable  for  all  time  steps  for  all  records: 


"  •'  H 

**=Z  Zz« 


(5  24) 
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An  estimate  of  the  probability  that  a  peak  exceeding  the  threshold  will  occur  at 
any  given  instance  of  time: 


/V  = 
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(5  25) 


The  mean  number  of  peaks  over  the  threshold  per  record: 
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(5  26) 


The  rate  of  events  for  the  peak  over  the  threshold  can  be  introduced  analogously 
to  the  rate  of  uperossing.  Its  estimate  over  a  finite  volume  of  data  is  defined  as: 


X-  _  >”/  _ 

^  r<,T  ~  ~ 


■ :  h 
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(5  27) 


While  the  theoretical  definition  can  be  obtained  as  a  result  of  a  limit  transition  for 
the  infinite  number  of  records  and  infinitely  small  time  step: 
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The  discussion  of  applicability  of  binomial  distribution  for  the  auxiliary  variable 
Z  follows  the  same  thread  as  for  the  auxiliary  variable  D  in  the  case  of  both-sides 
crossings.  Figure  5.4  shows  an  example  calculation  at  the  level/threshold  ±9  m.  As  it  can 
be  seen  from  that  figure,  the  estimate  of  the  rate  of  peaks  over  the  threshold  is  statistically 
identical  to  the  estimate  of  the  rate  of  the  both-sides  crossing.  The  theoretical  result  for 
the  rate  of  the  both-sides  crossing  is  not  rejected  by  either  of  the  estimates. 


Figure  5.4  Both-sides  crossing  rate:  theoretical  \alue  and  statistical  estimate  by  counting  vs.  rate  of 

peaks  over  the  threshold.  Crossing  level  ±9  m 


Figure  5.5  shows  a  comparison  between  the  theoretical  values  of  the  both-sides 
crossing  rate,  its  estimate  by-counting,  and  the  estimate  of  rate  of  the  absolute  value  of 
the  peaks.  The  points  and  the  curve  practically  coincide.  The  confidence  interval  was  too 
tight  to  plot  on  the  figure. 

To  show  the  confidence  interval,  a  log  scale  was  used,  see  Figure  5.6.  Even  in  the 
log  scale  the  confidence  interval  remains  too  tight  to  be  drawn,  with  the  exception  of  the 
level  10  m  and  higher.  To  avoid  any  further  cluttering,  the  smallest  of  the  two  upper 
boundaries  was  shown  for  the  upper  limit,  while  the  largest  of  the  lower  boundaries  was 
shown  for  lower  limit.  All  the  numbers  are  also  shown  in  Table  1 3. 
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Figure  5.5  Theoretical  value  both-sides  crossing  rate  (red  curve),  statistical  estimate  of  both-sides 
crossing  rate  by  counting  (circles)  and  statistical  estimate  of  rate  of  absolute  value  of  peaks  over  the 

threshold  (crosses). 


Figure  5.6  Theoretical  value  both-sides  crossing  rate  (red  curve),  statistical  estimate  of  both-sides 
crossing  rate  by  counting  (circles)  and  statistical  estimate  of  rate  of  absolute  value  of  peaks  over  the 

threshold  (crosses). 
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Table  13.  Both-sides  crossing  rates  un d  rates  of  peaks 


Crossing 
level ! 
threshold, 

m 

Theoretical 

both-sides 

crossing 

rate 

Estimate  of  both-sides  crossing  rate 

Estimate  of  rate  of  absolute  value  of 
peaks 

low 

mid 

upper 

low 

mid 

upper 

4 

0.0543 

0.0539 

0.05412 

0.0554 

0.05417 

0.05439 

0.05568 

4.5 

0,04134 

0.04102 

0.04128 

0.04234 

0.04131 

0.04156 

0.04264 

5 

0.03049 

0  02988 

0.03014 

0.03101 

0.03013 

0.0304 

0.03127 

5.5 

0.02177 

0.02106 

0.02132 

0.02201 

0.02126 

0.02152 

0.0222 1 

6 

0.01506 

0.01462 

0.01487 

0.01542 

0.01479 

0.01504 

0.01559 

6.5 

0.01009 

9.54E-03 

9.76E-03 

0.01018 

9.66E-03 

9.88E-03 

0  01031 

7 

6.54E-03 

6.10E-03 

6  29E-03 

6.62E-03 

6.19E-03 

6.39E-03 

6.71E-03 

7.5 

4.1  IE-03 

3.84E-03 

4.00E-03 

4.25E-03 

3.90E-03 

4.07E-03 

4.32E-03 

8 

2.50E-03 

2.25E-03 

2.39E-03 

2.58E-03 

2,3  IE-03 

2.45E-03 

2  64E-03 

8.5 

147E-03 

1.33E-03 

1.44E-03 

1.58E-03 

1.36E-03 

1.47E-03 

1.61  E-03 

9 

8.4  IE-04 

7.67E-04 

8.53E-04 

9.58E-04 

7.86E-04 

8.72E-04 

9. 78  E-04 

95 

4  65E-04 

4.31E-04 

4.97E-04 

5.78E-04 

4.44  E-04 

5.11  E-04 

5. 92  E-04 

10 

2.49E-04 

2.31E-04 

2.81E-04 

3.39E-04 

2.36E-04 

2.86E-04 

3.44E-04 

10.5 

1.29E-04 

1.14E-04 

1.50E-04 

1.92E-04 

1  14E-04 

1.53  E-04 

1 .97  E-04 

1  1 

6.47E-05 

3.61  E-05 

6. 11  E-05 

8.89E-05 

3.89E-05 

6.39E-05 

9  17  E-05 

11.5 

3.I4E-05 

1  11  E-05 

2.50E-05 

4.17E-05 

111  E-05 

2.78E-05 

4.72  E-05 

12 

1.48E-05 

0 

8.33E-06 

1.94  E-05 

0 

8.33E-06 

1.94E-05 

Looking  at  the  numbers  in  Table  13,  one  can  confirm  that  estimates  of  the  rates  of 
the  absolute  value  of  peaks  over  the  thresholds  are  statistically  identical  to  estimates  of 
the  both-sides  crossing  rate.  Both  estimates  include  the  theoretical  value  for  the  both- 
sides  crossing  rate  into  their  confidence  intervals.  Therefore,  statistics  of  the  absolute 
value  of  peaks  over  the  threshold  can  be  used  to  characterize  events  of  both-sides 
crossing. 

5.2. Theoretical  Solution  for  Both-sides  Crossings 

The  theoretical  solution  for  the  uperossing  problem  was  readily  available.  It  is  not  the 
ease  for  the  peak-based  envelope.  At  the  same  time,  the  approximate  theoretical  solution 
is  needed  to  verify  the  extrapolation  method  being  developed. 


5.2. 1.  Distribution  of  Absolute  Value  of  Peaks 

The  total  number  of  peaks  (both  positive  and  negative)  found  in  the  wave 
elevation  sample  data  set  was  62,135.  The  absolute  value  of  a  peak  is  defined  as: 


x 


peak 


w  MvJ=°) 


(5.29) 


Figure  5.7  is  a  histogram  of  absolute  values  of  peaks  superimposed  with  Rayleigh 
distribution.  The  histogram  is  somewhat  similar  to  the  distribution  of  positive  peaks  (see 
section  3);  however,  it  cannot  have  any  negative  values  by  the  definition  (5.29).  It  shows 
larger  values  in  the  vicinity  of  zero  in  comparison  with  the  histogram  of  positive  peaks; 
the  value  of  the  first  bueket  is  well  above  0. 1 ,  while  the  first  bucket  of  the  positive  peaks 
is  below  0. 1 . 
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Kigure  5.7  Histogram  of  absolute  values  of  peaks  of  wave  elevations  with  superimposed  Ravleigh 

distribution 


Obviously,  the  absolute  values  of  peaks  do  not  follow  a  Rayleigh  distribution. 
This  result  can  be  explained  in  a  w  ave,  similar  to  the  case  of  positive  peaks.  The  reason 
why  a  Rayleigh  distribution  is  inapplicable  as  a  whole,  is  existence  and  statistical 
influence  of  secondary  peaks.  It  is  known  that  number  (and  statistical  influence)  of 
secondary  peaks  depends  on  the  spectrum  bandwidth.  As  it  was  already  mentioned  in  the 
section  3,  peaks  of  a  normal  process  with  “moderate”  bandwidth  follow  a  Rice 
distribution.  Once  the  bandwidth  becomes  very  large,  the  autocorrelation  function  dies 
out  very  quickly  and  the  process  becomes  effectively  white  noise.  If  there  is  no 
autocorrelation,  peaks  are  encountered  totally  randomly;  they  become  distributed  just  like 
any  other  value  of  the  process.  Therefore,  for  the  limit  case  of  the  spectrum  bandwidth, 
the  peaks  take  a  normal  distribution.  The  other  limit  case  is  a  very  narrow  spectrum.  As 
it  was  mentioned  in  Section  3,  the  distribution  of  peaks  of  narrow-banded  process  follows 
Rayleigh.  It  also  can  be  explained  by  the  fact  that  the  narrow  spectrum  makes  the 
envelope  a  slowly  changing  function,  and  the  variance  of  the  zero-crossing  period 
becomes  very  small.  It  also  means  that  there  are  very  few  secondary  peaks;  they  become 
statistically  insignificant.  The  envelope  contains  all  of  the  peaks;  sampling  the  envelope 
with  almost  constant  step  makes  the  peaks  keep  the  distribution  of  the  envelope,  c.g. 
Rayleigh  distribution.  This  was  demonstrated  in  Section  3  for  the  positive  peaks  of  wave 
elevations  recorded  from  the  moving  gauges  (encounter  wave  sample). 

Similar  to  the  distribution  of  positive  peaks,  the  distribution  of  absolute  values  of 
peaks  follows  a  truncated  Rayleigh  distribution  (see  Section  3)  starting  from  a  certain 
bucket. 
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Here  A„(a,)  is  normalization  coefficient  caleulated  at  the  truncation  value  a,.  The 
following  formula  was  derived  in  Section  3  for  the  truncation  coefficient: 


2  \ 
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2V 
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(5.31) 


Substitution  of  formula  (5.31)  into  formula  (5.30)  gives  the  following  expression 
for  the  truncated  Rayleigh  distribution: 
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fJa)=—' exp 
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a  >  a, 


(5.32) 


The  cumulative  distribution  is  expressed  as: 


c  ,  \  i 

/ra/(")  =  1“cxP - ~ 


2V 


a  >  a, 


(5.33) 


*  J 


The  truncated  Rayleigh  distribution  and  truncated  histogram  are  shown  in  Figure 
5.8;  the  value  of  truncation  has  been  chosen  to  pass  Pearson  chi-square  goodness-of-fit 
test.  The  results  of  the  Pearson  chi-square  goodness  of-fit  test  are  also  shown  in  Figure 
5.8. 


The  explanation  is  similar  to  the  one  given  in  Section  3.  Secondary  peaks  arc 
relatively  small.  Large  peaks  are  primary  peaks;  therefore  they  belong  to  the  envelope.  It 
is  also  known  from  the  theory'  of  the  envelope  that  conditional  variance  of  the  period 
decreases  when  the  amplitude  increases.  This  means  that  large-amplitude  oscillations 
have  periods  very  close  to  the  mean  period;  the  illustration  of  this  effect  can  be  seen  in 
the  appendix  to  (Belenky  and  Bassler  2010).  Again,  if  the  peaks  belong  to  the  envelope 
and  are  sampled  with  almost  a  constant  step,  they  keep  the  distribution  of  the  envelope, 
e.g.  Rayleigh  distribution. 


Consider  the  sample  with  a  narrow  spectrum  (see  Section  4),  created  by  the  wave 
elevations  “recorded”  with  a  “gauge”  moving  in  the  same  direction  with  the  waves 
(following  encounter  waves).  It  was  shown  in  Section  3  that  the  distribution  of  its 
positive  peaks  is  closer  to  Rayleigh;  the  positive  peaks  start  following  truncated  a 
Rayleigh  distribution  from  the  amplitudes  of  0.54  m,  while  for  the  “zero-speed”  case  for 
this  value  was  1.51  m. 
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Figure  5.8  Histogram  of  absolute  values  of  peaks  and  truncated  Rayleigh  distribution 


A  similar  picture  can  be  observed  for  the  absolute  value  of  the  peaks.  There  were 
32,717  peaks  in  total.  Figure  5.9  shows  a  histogram  of  absolute  values  of  peaks  for  15- 
knot  case  of  the  encounter  waves.  Visually,  the  hypothesis  of  Rayleigh  distribution  looks 
very  plausible.  However,  the  Pearson  chi-square  goodness-of-fit  test  rejects  the 
hypothesis,  because  of  the  first  two  buckets. 


0.3T 


0.2- 


or_ 


pdf 


r 


/ 


Pearson  chi-square  goodness-of-fil  lest 
Number  of  buckets  63 
X:=787.46  d=62 


k 


TlTUiTr, 


pvaki  ni 

— 1 - 


4 

14 


Figure  5.9  Histogram  of  absolute  \alue  of  peaks  for  the  case  with  forward  speed  15  knots  and 

Rayleigh  distribution 


As  it  could  be  expected,  the  truncated  Rayleigh  distribution  becomes  applicable  to 
the  absolute  peaks  of  the  follow  ing  encounter  w  aves  with  smaller  values  (Figure  5.10)  in 
comparison  with  the  zero-speed  case  (Figure  5.8).  The  reasoning  is  similar:  the  narrow 
band  spectrum  makes  the  secondary  peaks  less  likely  and  large-amplitude  data  are  likely 
to  have  a  period  close  to  the  mean  period. 

Based  on  the  considerations  above,  it  seems  reasonable  to  assume  that  large  peaks 
follow  the  truncated  Rayleigh  distribution. 
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Figure  5.10  Histogram  of  absolute  value  of  peaks  for  the  case  with  forward  speed  15  knots  and 

truncated  Rayleigh  distribution 


5.2.2.  Rare  Problem  for  Both-sides  Crossing 

Consider  the  rare  problem  for  a  peak-based  envelope.  The  objective  is  to  find  a 
conditional  probability  such  that  if  the  peak-based  envelope  crosses  a  given  threshold  a i, 
it  will  exceed  a  given  level  ai. 

Then,  the  conditional  probability  that  the  process  will  cross  the  level  02  if  the 
level  a\  is  crossed  (formula  3.9),  provided  ai>  fli.  and  taking  into  account  Rayleigh 
distribution  of  the  envelope: 


p  JVh) 

/(«■  1) 


[fMW 

a2 

V. 

j 'fMW 

l*\ 


1-F,(«,)  (a2-a,V 

1  -FM)  l  2K 


(5.34) 


The  value  P  expressed  with  formula  (5.34)  is  the  probability  that  if  the  process  x 
up-crosses  the  threshold  a  1.  it  will  also  up-cross  the  level  ai.  It  also  can  be  considered  as 
a  fraction  (actually  a  limit  value  of  it)  of  the  upcrossings  through  the  threshold  a  1,  which 
also  cross  the  level  ai. 

As  the  sample-process  x  is  normal  (the  normal  distribution  is  symmetric)  and 
centered  (the  mean  value  is  zero),  the  same  value  P  describes  the  conditional  probability 
that  the  process  v  will  down-cross  the  level  -ai  if  it  has  previously  down-crossed  the 
threshold  —a-\. 

Then,  this  probability  also  describes  a  fraction  of  both-sides  crossing  of  the 
threshold  ±a\  that  will  also  cross  the  level  ±02- 

The  above  consideration  concludes  that  the  problems  of  uperossing, 
downerossing,  and  both-sides  crossing  have  the  same  rare  solution  if  the  process  is 
normal  and  centered.  In  principle,  this  statement  can  be  generalized  for  any  symmetric 


170 


distribution,  but  it  is  outside  of  the  scope  of  the  current  consideration  as  the  sample 
process  is  normal. 

Finally,  the  solution  of  the  rare  problem  for  the  both-sides  crossing  is  identical  to 
the  solution  of  the  rare  problem  for  uperossing: 


f 

PH  =  exp 

V 


/  2  ^ 

(o2 "i  ) 


2V. 


—  P  a,  >  a. 


J 


(5  35) 


Similar  to  the  uperossing,  case  formula  (5.35)  can  be  interpreted  in  terms  of 
absolute  values  of  peaks:  formula  (5.35)  expresses  a  conditional  probability,  that  an 
absolute  value  of  a  peak  exceeding  the  threshold  a\  will  also  exceed  the  level  <v2.  As  the 
absolute  values  of  peaks  have  a  truncated  Rayleigh  distribution,  such  an  interpretation  of 
(5.35)  is  only  valid  for  a,  <  ci\. 

The  complete  theoretical  solution  for  the  both-sides  crossing  rate  of  the  level  ch 
can  be  expressed  as: 

KM  —  )f«  (5  36) 


Taking  into  account  formula  (5.25)  for  the  both-sides  crossing  of  the  threshold  a\ 
and  simultaneously  substitute  (5.35): 


1  V 
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(5  37) 


After  simplification,  the  formula  (5.37)  yields: 


i)  =  “J“exp 
n  v  V, 


!  \ 

a. 
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(5  38) 


Formula  (5.38)  is  identical  to  Formula  (5.16).  It  is  the  direct  expression  for  the 
rate  of  the  both-sides  crossing  of  a  normal  process  with  zero  mean.  This  confirms  the 
applicability  of  the  rare  solution  (5.35)  for  the  both-sidess  crossings. 

Direet  application  of  the  formula  (5.36)  for  extrapolation  may  encounter 
difficulties  related  to  applicability  of  Poisson  flow. 

Strictly  speaking,  the  applicability  of  the  Poisson  flow  is  required  only  for  the 
level  where  the  probability  of  failure  will  be  evaluated.  In  this  case,  this  would  be  the 
level  ci2-  Therefore,  theoretically,  Poisson  flow  may  be  not  applicable  at  the  threshold  a \. 
but  the  method  is  still  valid.  However,  by  the  very'  meaning  of  extrapolation,  the  crossing 
events  of  the  level  az  are  not  expected  to  be  seen.  Therefore,  it  is  impossible  to  verify 
applicability  of  the  Poisson  flow  for  the  level  «2. 
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A  practical  way  to  insure  applicability  of  the  Poisson  flow  on  the  level  a2  is  to 
verify  that  it  is  applicable  on  the  lower  level,  for  which  there  is  enough  data  to  make  a 
judgment.  If  applicability  of  the  Poisson  flow'  has  been  confirmed  for  the  threshold  <7|, 
then  it  is  applicable  for  the  level  a2. 

As  it  was  shown  previously  (see  Table  10)  the  applicability  of  the  Poisson  flow 
for  the  both-sides  crossing  can  only  be  seen  above  the  threshold  of  10.5  m.  There  were 
only  54  events  on  that  level.  As  it  was  explained,  due  to  strong  autocorrelation,  these 
events  have  a  tendency  to  appear  in  pairs,  so  the  threshold  has  to  be  very'  high,  so  one 
side  is  crossed  and  another  one  does  not.  This  situation  is  expected  to  become  even 
worse  for  the  encounter  waves  as  it  takes  longer  for  the  autocorrelation  function  to  die 
out.  These  difficulties  justify  application  of  the  envelope  instead  of  the  process  itself  (see 
Section  4). 


5.2.3.  Rare  Problem  for  Upcrossing  of  Envelope 

Consider  the  rare  problem  for  the  theoretical  envelope  upcrossing.  The  theoretical 
envelope  is  a  stationary  stochastic  process;  its  values  have  a  Rayleigh  distribution  and  its 
first  derivative  is  distributed  normally.  Based  on  these  considerations,  it  was  shown  in 
section  4  that  the  upcrossing  rate  of  the  envelope  can  be  presented  as,  see  formula 
((4.82): 
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Application  of  the  general  formula  for  the  rare  solution  (equation  3.9),  for  the 
case  of  the  envelope,  yields  the  following  expression: 
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Taking  into  account  (5.35)  the  relation  between  the  solutions  for  rare  problem  for 
the  upcrossing  of  the  normal  process  itself  and  its  envelope: 


P=^P 


(5.41) 


To  verify  the  solution  of  the  rare  problem,  consider  the  complete  one,  expressing 
the  rate  of  the  upcrossing  of  the  envelope  through  the  level  a2 : 


Xe(a2)  =  Xfa,)Pe  (5.42) 

Consider  substitution  of  (5.39)  and  (5.40)  into  (5.42) 
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After  simplification,  expression  (5.43)  yields  the  following: 
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Formula  (5.43)  is  identical  in  structure  to  formula  (5.39)  and  explicitly  expresses 
the  rate  of  uperossing  through  the  level  az.  This  is  the  confirmation  of  correctness  of  the 
solution  of  the  rare  problem  (5.40). 

5.2.4.  Uperossing  of  Peak-based  Envelope  vs.  Theoretical  Solution 

As  it  was  shown  in  the  subseetion  4.5,  the  rate  of  uperossing  of  a  peak-based 
envelope  may  be  quite  different  form  the  theoretical  solution.  It  depends  on  width  of  the 
speetrum,  if  the  theoretical  solution  can  be  used  to  describe  the  rate  of  uperossing  If  the 
speetrum  is  narrow,  the  envelope  is  a  slowly  changing  function  (in  comparison  with  the 
proeess  itself),  then  the  peak-based  envelope  is  quite  elose  to  the  theoretical  envelope,  see 
Figure  5.1  1  (a).  If  the  spectrum  is  relatively  wide  the  rate  of  change  of  the  envelope  is 
comparable  with  the  first  derivative  of  the  proeess  and  the  differences  between  the  peak- 
based  and  theoretical  envelope  may  not  be  insignificant,  see  Figure  5.1 1  (b). 

This  explains  the  effect  described  in  the  Seetion  4.  The  theoretical  uperossing 
rate  of  the  theoretical  envelope  (formula  4.83)  has  shown  good  agreement  with  the 
statistical  estimate  of  the  uperossing  rate  of  the  peak-based  envelope  for  the  following 
wave  case  (see  Figure  5.12  a),  while  the  theoretical  solution  and  statistical  estimates  do 
not  agree  for  the  zero-speed  case  (see  Figure  5.12  b).  This  creates  a  problem  a 
theoretical  solution  is  needed  to  eompare  with  the  results  of  EPOT  method. 

a)  b) 

6 
4 
2 
0 
_2 
-4 
-6 

Figure  5.1 1  Zoomed  in  fragments  of  peak-based  en\ elope  superimposed  on  the  theoretical  envelope: 
a)  recorded"  by  the  “gauge"  moving  with  the  waves  (pure  following  seas)  with  the  speed  15  knots  b) 

recorded  by  fixed  “gauge"  -zero  speed  case 
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Figure  5.12  Statistical  estimate  of  upcrossing  rate  (1/s)  of  the  peak  based  envelope  :  a)  recorded’"  b> 
the  ‘‘gauge”  moving  with  the  waves  (pure  following  seas)  with  the  speed  15  knots  b)  recorded  by  fixed 

“gauge”  -zero  speed  case 


5.2.5.  Both -Sides  Crossing  Rate  as  the  Theoretical  Solution 

As  it  was  demonstrated  above,  there  is  a  strong  relationship  between  absolute 
value  of  peaks  and  both-sides  crossings.  It  also  was  shown  that  the  rate  of  the  both-sides 
crossing  has  a  theoretical  solution.  For  the  case  of  symmetric  centered  process,  it  equal  to 
double  upcrossing  rate,  see  formula  (5.15). 

The  event  of  the  both-sides  crossing  through  the  prescribed  level  obviously  is  the 
partial  stability  failure.  It  was  shown  also  that  the  Poisson  distribution  is  not  applicable  to 
this  event,  and  therefore  its  use  in  these  calculations  may  be  limited.  However,  it  still 
makes  sense  to  see  the  relationship  between  the  rate  of  the  both-sides  crossing  and  rate  of 
upcrossing  of  the  peak-based  envelope. 

Comparison  of  the  statistical  estimates  of  uperossings  of  the  peak-based  envelope 
with  the  theoretical  rate  of  both-sides  crossing  is  shown  in  Figure  5.13  for  the  following 
waves  case.  Obviously,  these  are  two  different  values;  the  theoretical  rate  of  both-sides 
crossings  does  not  belong  to  the  confidence  interval  of  any  of  the  statistical  estimates. 
However,  the  tendency  of  the  estimates  shows  some  signs  of  conversion  with  the 
theoretical  rate  of  both-sides  crossing.  This  tendency  can  be  confirmed  by  comparison  of 
two  theoretical  curves,  not  limited  by  gathered  statistics.  They  are  shown  in  Figure  5.14; 
the  convergence  tendency  is  obvious. 

The  convergence  of  the  rate  of  the  both-sides  crossing  and  the  envelope 
upcrossing  is  not  universal;  it  is  not  true  for  the  zero  speed  case  shown  in  Figure  5.15. 
After  the  crossing  the  curve  shows  the  tendency  to  diverge.  In  principle,  same  effect  can 
be  seen  for  the  following  waves,  but  it  occurs  for  higher  levels  (~25  m  vs.  10  m)  and 
much  smaller  number  for  the  rates  ( I  .E- 1 8  vs.  1  .E-3). 

Figure  5.16  shows  the  theoretical  rate  of  the  both-sides  crossing  plotted  along 
with  statistical  estimates  of  upcrossing  of  the  peak-base  envelope  for  the  zero  speed  case. 
One  can  notice  that  the  theoretical  value  starts  belonging  to  the  confidence  interval  from 
the  level  of  9.5  m  and  never  leaves  it  after  that  level. 
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Figure  5.1.1  Theoretical  rate  of  hoth-side  crossing  and  statistical  estimate  of  uperossing  of  the  peak- 

based  en\ elope.  Following  Waves  Case 


Figure  5.14  Theoretical  rate  of  both-sides  and  uperossing  of  the  envelope.  Following  \\  a\es  Case 
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Figure  5.16  Theoretical  rate  of  both-sides  crossing  and  statistical  estimate  of  upcrossing  of  the  peak- 

based  envelope.  Zero-Speed  Case 


These  comparisons  show  that  both-sides  crossing  rate,  in  principle,  can  be  used  as 
a  theoretical  solution  for  the  problem  of  upcrossing  of  peak-based  envelope. 

For  the  following  waves  case,  asymptotic  convergence  of  the  both-sides  crossing 
rate  and  envelope  upcrossing  rate  confirms  the  above  statement;  since  the  peak-based 
envelope  is  a  good  approximation  for  the  theoretical  envelope,  the  theoretical  envelope 
upcrossing  rate  can  be  used  for  a  close  approximation  as  well. 

The  reason  why  the  convergence  is  asymptotic  seems  to  be  as  follows.  The  rate  of 
envelope  upcrossing  for  lower  levels  is  smaller  then  the  rate  of  both-sides  upcrossing, 
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then  one  crossing  of  the  envelope  corresponds  to  several  both-sides  crossings.  The 
reason  for  that  is  significant  clustering  due  to  the  narrow  spectrum.  Once  the  level 
increases,  there  are  more  chances  that  the  process  crosses  the  level  only  twice  (both 
sides)  corresponding  to  the  highest  peak  in  vicinity,  as  the  chances  that  the  two 
neighboring  peaks  are  exactly  the  same  is  small. 

For  the  zero-speed  case,  there  is  no  convergence  between  the  theoretical  env  elope 
uperossing  rate  and  both-sides  crossing  rate.  The  both-sidcs  uperossing  rate  goes  higher 
than  the  envelope  uperossing  rate  until  a  certain  point  and  then  becomes  slower  (point  A 
in  Figure  5.15.  This  can  be  explained  as  follows.  Before  point  A,  the  level  is  relatively 
low.  so  there  arc  significant  chances  that  several  periods  will  be  crossed  once  the  peak- 
based  envelope  is  crossing.  This  mechanism  is  similar  to  what  was  described  for  the 
following  wave  case.  After  point  A,  the  crossing  rate  on  the  envelope  uperossing  is 
higher  than  the  both-sides  crossing.  That  means  the  envelope  has  more  crossings  than  the 
process  itself  (both  sides).  This  occurs  because  the  envelope  oscillates  between  peaks 
(see  Figure  5.12b)  of  the  process  and  can  cross  the  level  while  the  peak  remains  below 
the  level.  This  is  exactly  the  reason  why  the  peak-based  envelope  was  introduced  in 
section  4.  Therefore,  the  both-sides  crossing  rate  is  the  value  to  trust  here. 

Figure  5.16  shows  a  convergence  of  the  statistical  rate  of  the  uperossing  of  the 
peak  based  envelope  with  the  theoretical  rate  of  the  both-sides  crossing.  The  reason  why 
the  difference  is  large  for  smaller  value  of  the  threshold  is  likely  to  be  the  same  as  above, 
one  crossing  of  the  peak-based  envelope  corresponds  to  several  crossings  of  the  both- 
sides  of  the  process.  Once  the  level  is  high  enough  that  only  the  highest  peak  in  a  cluster 
can  reach  it,  the  theoretical  rate  becomes  included  in  the  confidence  interval. 

5.2.6.  Approximate  Solution  for  Peak-Based  Envelope  Uperossing 

Considerations  in  the  previous  subsection  established  that  the  correct  theoretical 
solution  -  the  rate  of  the  both-sides  crossing  -  is  achieved  asymptotically.  This  may 
render  inconclusive  the  comparison  of  the  extrapolation  results  with  the  both-sides  rate 
alone,  as  the  failure  always  can  be  explained  that  the  level  is  not  “high  enough” 

Therefore  it  makes  sense  to  use  the  uperossing  rate  of  the  theoretical  envelope  for 
another  comparison  base  for  the  following  wave  ease.  This  makes  the  following  wave 
case  the  only  “clean”  comparison,  where  the  theoretical  solution  is  available  every  where. 

An  approximate  non-rare  solution  may  be  useful  for  the  zero  speed  ease.  It  can  be 
developed  by  the  least  square  approximation  through  the  statistical  estimate  of 
uperossing.  At  least  such  a  solution  can  be  used  to  test  the  solution  for  the  rare  problem 
The  uperossing  rate  is  searched  in  the  following  form: 


(5.45) 


Taking  the  natural  logarithm  for  both  sides  of  (5.45)  and  introducing  a  new 
variable y  yields: 


177 


y  =  \n(Xx) 


(5.46) 


V  =  C0  +ctz  +  c2z2 

Coefficients  Co,  c\,  and  Ci  are  evaluated  with  the  least  square  method  using 
statistical  estimates  for  the  upcrossing  rates.  The  values  of  these  coefficients  are 
characterized  by  significant  variability  from  one  dataset  to  another,  see  Table  14.  The 
result  is  shown  in  Figure  5.17. 


Table  14.  Coefficients  for  curve  fit  for  upcrossing  rates 


Coefficient  for  the  curve  fit 

^0 

Cl 

c2 

Original  Data  Set 

-5.427 

0.628 

-0.093 

Alternative  Data  Set  1 

-1.054 

-0.362 

-0.038 

Alternative  Data  Set  2 

-5.209 

0.715 

-0.106 

5.3. Extrapolation  with  EPOT:  Following  Wave  Case 

As  it  was  demonstrated  above,  only  the  following  wave  case  allows  comparison 
with  the  robust  theoretical  solution.  The  reason  is  that  as  the  encounter  spectrum  is 
narrow  in  the  following  waves,  the  theoretical  envelope  becomes  a  slowly  changing 
function  of  time  in  comparison  with  the  process  itself.  Therefore,  the  peak-based 
envelope  becomes  a  reasonable  approximation  of  the  theoretical  envelope.  Then  the  rate 
of  uperossings  of  the  peak-based  envelope  can  be  described  by  the  theoretical  formula, 
which  is  available  for  the  theoretical  envelope. 

That  said,  the  following  wave  case  truly  represents  a  test  bed  for  the  method  since 
the  true  answer  is  known  from  theory. 
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5.3. 1.  Distribution  of  Maxima  of  the  Peak-Based  Envelope 

Application  of  the  EPOT  method  is  only  slightly  different  for  the  POT  method 
that  was  described  in  detail  in  Seetion  3.  Therefore  the  focus  must  be  on  these 
differences. 

The  first  step  is  the  search  for  maxima  of  the  peak-based  envelope,  see  Figure 
5  18.  Distribution  of  the  maxima  of  the  peak-based  envelope  is  shown  in  Figure  5.19(a) 
along  with  Rayleigh  distribution  (however,  the  eharaeter  of  the  distribution  looks 
lognormal  rather  than  Rayleigh).  At  the  same  time,  the  truncated  Rayleigh  distribution 
(see  formula  5.30)  is  not  rejected  by  the  ehi-square  goodness-of-fit  test  for  the  tail  of  the 
distribution  starting  at  the  7  m  level,  see  Figure  5  19(b). 


Figure  5. IX  Peak  based  envelope  (red)  and  its  maxima 


p.d.f.  a)  b) 


Figure  5.19  Distribution  of  maxima  of  the  peak  based  envelope  superimposed  with  Rayleigh 
distribution  (a),  truncated  Rayleigh  dustiribution 


5.3.2.  Pitting  Weibull  Jor  Maxima  of  the  Peak-Based  Envelope 

Similar  to  the  POT  method,  the  second  step  is  fitting  a  histogram  of  maxima  of 
the  peak-based  envelope  and  fitting  the  Weibull  distribution  to  these  data.  In  this  context, 
the  Weibull  distribution  is  used  just  as  a  smoothing  curv  e  for  the  empirical  distribution 

Only  the  data  exceeding  a  given  threshold  are  used  to  fit  the  Weibull  distribution. 
Exactly  like  in  the  ease  of  the  POT  method  described  in  Seetion  4  ,  the  first  guess  for  the 
parameters  of  the  Weibull  distribution  is  performed  using  the  moments  method  described 
in  Seetion  2  and  then  the  method  of  maximum  likelihood  is  applied.  The  samples  are 
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shown  in  Figure  5.20  along  with  the  results  of  testing  of  the  goodness-of-fit.  The  fitted 
distribution  was  not  rejected  in  both  cases  for  the  threshold  equal  to  8  m  and  9  m. 

The  third  step  is  evaluation  of  the  confidence  interval  for  the  fitted  distribution. 
The  technique  for  evaluation  of  the  confidence  interval  is  described  in  the  section  3.  The 
idea  is  to  find  the  confidence  interval  for  the  mean  value  and  variance  estimates  and  then 
shift  and  scale  the  data  accordingly.  Once  done,  two  more  Weibull  fits  are  performed  on 
the  altered  data,  corresponding  to  upper  and  lower  boundary.  The  sample  result  for  9  m 
threshold  is  shown  in  Figure  5.21 . 


b)  Threshold  8  m 


Figure  5.20  Weibull  fit  for  the  maxima  of  the  peak-based  envelope  exceeding  the  threshold  of  a)  0  m 

and  b)  8  m 


Figure  5.21  Weibull  CDF  with  confidence  interval  fitted  for  the  maxima  of  the  peak-based  envelope 

exceeding  the  threshold  of  9  m 


Figure  5.21  also  shows  the  Rayleigh  distribution  completely  contained  (at  least 
visually)  within  the  confidence  interval  and  almost  coinciding  with  the  Weibull  fit.  This 
is  consistent  with  previously  made  conclusions  that  the  Rayleigh  distribution  is  not 
rejected  for  the  tail  of  the  maxima  of  the  peak-based  envelope. 

The  Weibull  fit  actually  represents  the  conditional  distribution  for  a  peak  of  the 
envelope  to  exceed  a  level  if  the  given  threshold  was  already  exceeded. 
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5.3.3.  Extrapolation  with  the  Distribution  of  the  Peak-Based  Envelope 

Once  both  non-rare  and  rare  solutions  are  obtained,  the  procedure  of  extrapolation 
is  trivial:  it  is  the  application  of  formulae  (3.22-3.24).  The  sample  result  is  shown  in 
Figure  5.22  for  the  threshold  of  9  m.  The  extrapolated  solution  is  shown  with  its 
confidence  interval  and  superimposed  with  the  rate  of  uperossing  of  theoretical  envelope 
as  well  the  theoretical  rate  for  both-sides  crossings.  As  the  peak-based  envelope  is 
considered  as  a  reasonable  approximation  of  the  theoretical  envelope,  the  uperossing  rate 
of  the  latter  is  expected  to  stay  within  the  confidence  interval  of  the  extrapolated  solution 
As  it  can  be  seen  from  the  Figure  5.22,  the  theoretical  solution  stays  within  the 
confidence  interval  until  a  certain  level  (it  equals  18.5  m  for  the  9  m  of  the  threshold),  the 
breaking  point  as  it  was  defined  in  Section  3. 

Similar  to  the  uperossing  problem,  discussed  in  Section  3.  the  position  of  the 
breaking  points  depends  on  the  threshold,  see  Figure  5.23. 


Figure  5.22  Extrapolated  estimate  of  uperossing  rate  of  the  peak-based  envelope  with  confidence 
interval  as  a  function  crossing  level.  The  threshold  is  9  m,  53  peaks 
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Figure  5.23  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  vs.  threshold 


Figure  5.22  also  shows  the  theoretical  rate  of  the  both-sides  crossing.  As  it  was 
shown  above,  this  solution  gives  “the  correct”  answer  only  for  the  very  high  level  and 
only  for  the  case  of  narrow  spectrum.  This  is  the  reason  why  the  confidence  interval  does 
not  include  the  rate  of  the  both-sides  crossing  for  smaller  levels;  however,  starting  at  the 
level  of  ~11.4m,  the  theoretical  rate  of  the  both-sides  crossings  enters  the  confidence 
interv  al  of  the  extrapolated  solution  and  stays  there  until  the  level  of  18.75  m  because  of 
the  convergence  discussed  above. 

The  breaking  point  evaluated  for  the  both-sides  crossing  rate  behaves  similarly  to 
the  “envelope  breaking  point”  as  it  can  be  seen  from  Figure  5.23. 

Further  analysis  of  the  performance  of  the  EPOT  method  is  done  for  the  level  of 
15  m.  As  it  can  be  seen  from  Figure  5.23,  this  is  the  level  where  the  method  starts 
breaking  up  for  some  of  the  thresholds.  Also  the  event  of  uperossing  the  level  of  15  m  is 
very  rare.  The  mean  time  for  the  event  (based  on  theoretical  envelope  uperossing  rate)  is 
about  7  years  and  4  months.  So,  if  10  events  are  needed  to  estimate  the  rate,  it  will  take 
about  73  years  of  data,  while  the  EPOT  method  only  used  100  hours  of  data. 

Figure  5.24  shows  the  influence  of  the  choice  of  the  threshold  on  the  rare  solution 
(probability  that  the  peak-based  envelope  exceeds  the  level  of  15  m,  if  the  threshold  is 
exceeded),  while  Figure  5.25  shows  the  complete  solution.  Similar  to  Figure  5.22, 
Figure  5.25  shows  both  theoretical  solutions;  the  rate  of  uperossings  of  the  theoretical 
envelope  and  the  both-sides  crossing  rates.  It  is  clearly  seen  from  Figure  5.25  that  for  the 
level  of  15  m,  the  difference  between  the  two  theoretical  solutions  is  small  in  comparison 
with  the  width  of  confidence  interval  of  statistical  extrapolation. 

The  estimates  oscillate  around  the  theoretical  solution  (compare  to  Figure  3.22 
and  Figure  3.26  plotted  for  the  uperossing  problem  in  section  3),  so  averaging  through 
several  levels  will  make  the  estimate  more  stable.  Formulae  (3.46-3.47)  express  this 
averaging  procedure.  Averaging  is  performed  for  all  the  thresholds  while  the  number  of 
points  remains  above  30  (30  points  is  considered  enough  to  evaluate  a  histogram  and  fit 
the  distribution).  The  results  of  averaging  are  shown  in  Figure  5.26. 
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Figure  5.24  Rare  solution  for  the  le\el  of  15  m 


As  it  can  be  seen  from  the  insert  in  Figure  5.26,  both  theoretical  solutions  are 
included  in  the  confidence  interval  for  the  test  level  of  15  m.  Breaking  points  are  16.5  m 
and  17  m  for  the  theoretical  envelope  uperossing  rate  and  rate  of  both-sides  crossings, 
respectively. 
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Figure  5.25  Statistical  extrapolation  of  the  uperossing  rate  of  peak-based  en\  elope  -  complete 

solution  for  the  level  of  15  m 


Successful  application  of  the  averaging  over  several  thresholds  for  the  current 
numerical  example  does  not  prove  yet  that  it  will  work  as  well  for  all  other  cases.  While 
it  seems  to  be  impossible  to  prove,  it  still  makes  sense  to  try  it  at  least  on  two  alternative 
data  sets  used  earlier  in  the  Section  3.  Figure  5.27  shows  dependence  of  the  breakpoints 
of  these  datasets  as  a  function  of  the  threshold.  The  lowest  point  is  about  12  m. 

Figure  5.28  shows  behaviors  of  a  rare  solution  and  the  complete  extrapolated 
estimate  for  1 5  m  using  two  alternative  datasets.  These  behaviors  are  principally 
similar  to  the  original  set  seen  in  Figure  5.24  and  Figure  5.25.  Most  of  the  threshold 
values  enable  the  estimate  “to  catch”  the  theoretical  solution  in  its  confidence  interval. 
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Figure  5.26  Averaged  extrapolated  estimate  of  rate  of  upcrossing  of  the  peak-based  envelope  Insert 

shows  the  level  of  15  m 


Figure  5.27  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  for  the 
extrapolated  estimate  of  upcrossing  of  peak-based  envelope  vs.  threshold  for  two  alternative  data  sets 

Figure  5.29  shows  results  of  the  averaging  technique  for  two  alternative  datasets. 
Both  theoretical  solutions,  the  rate  of  upcrossing  of  the  theoretical  envelope  and  the  rate 
of  both-sides  crossing,  are  within  the  confidence  interval  of  the  extrapolated  estimate. 
Data  for  breaking  points  are  shown  in  Figure  5.29  as  well. 

In  general,  the  performance  of  the  method  can  be  characterized  as  satisfactory 
taking  into  account  the  rarity  of  the  event  of  crossing  the  level  of  1 5  m. 
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a)  Rare  solution,  alternative  set  1 


b)  Rare  solution,  alternative  set  2 
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c)  Complete  solution,  alternative  set  1 


d)  Complete  solution,  alternative  set  2 
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Figure  5.28  extrapolated  estimate  of  conditional  probability  that  the  process  will  exceed  the  level  of 
15  m  if  the  threshold  has  been  crossed  -  rare  solutions  (upper  plots:  a,  b)  and  complete  extrapolated 
estimate  (lower  plots:  c,  d)  for  two  alternative  data  sets  for  a 2=15  m 


a)  Alternative  set  I 


a)  Alternative  set  2 
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Figure  5.29  Level  15  m:  theoretical  solution  and  extrapolated  estimate  averaged  for  (a)  the  set  I 
thresholds  7.5-9. 6  m;  the  distribution  for  the  threshold  9.6  m  was  fitted  with  33  points,  (b)  For  the  set 
2  range  is  7. 5-9.6  m  with  30  points  for  the  threshold  9.6  m. 


5.3.4 .  Extreme  Value  Distribution  of  the  Peak-Based  Envelope 

The  extreme  value  distribution  is  an  alternative  way  of  solving  the  rare  problem, 
as  it  was  shown  in  Section  3.  While  the  Weibull  distribution  is  used  in  both  cases,  the 
way  the  dataset  is  sampled  makes  the  difference.  Classic  extreme  value  theory  uses  the 
maxima  of  a  process  sampled  within  a  constant  time  window.  The  size  of  this  window 
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becomes  a  parameter  that  needs  to  be  set  in  order  to  use  the  method.  In  principle,  this  size 
must  produce  independent  data  points  in  the  neighboring  windows.  While  the  influence 
of  the  window  size  still  needs  to  be  studied,  it  was  chosen  to  be  900  seconds  for  the 
sample  in  this  section.  Figure  5.30  illustrates  that  procedure:  only  a  point  in  the  window 
1  was  collected  as  the  maximum  value  in  the  window  2  did  not  exceed  the  sample 
threshold  of  9  m. 


Figure  5.30  Collecting  data  for  extreme  value  distribution,  threshold  9  m 


Further  procedure  and  general  character  of  the  results  arc  not  very  different  from 
the  previous  case  where  the  Weibull  distribution  was  fitted  using  all  maxima  of  peak- 
based  distribution  exceeding  the  threshold.  Figure  5.31  shows  dependence  of  the 
breakpoint  level  (the  level  until  which  the  extrapolated  estimate  still  contains  a  theoretical 
solution  in  its  confidence  interval)  based  on  two  theoretical  solutions:  the  uperossing  rate 
of  the  theoretical  envelope  and  theoretical  rate  of  the  both-sides  crossings.  Both 
breakpoints  are  quite  close  to  each  other,  due  to  observed  convergence  of  both  of  the 
theoretical  solutions  for  higher  levels. 


Figure  5.31  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  xs.  threshold  for 
the  extrapolation  based  on  extreme  value  distribution 

Figure  5.32  shows  the  influence  of  the  choice  of  the  threshold  on  the  rare  solution 
for  the  level  of  15  m  (  the  probability  that  the  peak-based  envelope  exceeds  the  level  of 
15  m,  if  the  threshold  is  exceeded),  see  formula  (3.44),  while  Figure  5.33  shows  the 
complete  extrapolated  estimated  along  with  both  theoretical  solutions.  Due  to 
convergence,  the  difference  between  the  two  theoretical  solutions  is  small  in  comparison 
with  the  width  of  the  confidence  interval. 
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Figure  5.32  Rare  solution  for  the  le\el  of  15  ni  using  extreme  value  distribution 
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f  igure  5.33  Statistical  extrapolation  of  the  upcrossing  rate  of  peak-based  envelope  -  complete 
solution  for  the  level  of  15  m  based  on  extreme  value  distribution 


Similar  to  Figure  5.24  and  Figure  5.25  (as  well  as  analogous  to  Figure  3.22  and 
Figure  3.26)  plotted  for  the  uperossing  problem  in  Seetion  3,  the  estimates  oscillates 
around  the  theoretical  solution,  so  averaging  through  several  levels  will  make  the 
estimate  more  stable.  Formulae  3.46-3.47  express  this  averaging  procedure.  Similar  to 
the  first  method  (fitting  Weibull  to  maxima,  see  previous  subsections)  averaging  is 
performed  for  all  the  thresholds  while  the  number  of  points  remains  above  30  (30  points 
is  considered  enough  to  evaluate  a  histogram  and  fit  the  distribution).  The  results  of 
averaging  are  shown  in  Figure  5.34.  Both  theoretical  solutions  are  included  in  the 
eonfidenee  interval  for  the  test  level  of  15  m.  Breaking  points  are  17.75  m  and  IN  m  for 
theoretical  envelope  upcrossing  rate  and  the  rate  of  the  both-sides  crossings,  respectively. 
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Figure  5.34  Averaged  estimate  of  rate  of  upcrossing  of  the  peak-based  envelope  extrapolated  using 
extreme  value  distribution.  Insert  shows  the  level  of  15  m 


Successful  application  of  the  averaging  over  several  thresholds  for  the  eurrent 
numerical  example  does  not  prove  yet  that  it  will  work  as  well  for  all  other  eases.  While 
it  seems  to  be  impossible  to  prove,  it  still  makes  sense  to  try  it  at  least  on  two  alternative 
data  sets  used  earlier  in  Seetion  3.  Figure  5.35  shows  dependence  of  the  breakpoints  of 
these  datasets  as  a  function  of  the  threshold.  The  lowest  point  is  about  12  m. 

Figure  5.36  shows  behaviors  of  the  rare  solution  and  the  complete  extrapolated 
estimate  for  <72=15  rn  using  two  alternative  datasets.  These  behaviors  are  principally 
similar  to  the  original  set  seen  in  Figure  5.32  and  Figure  5.33  as  well  as  Figure  5.24, 
Figure  5.25  and  Figure  5.28.  Most  threshold  values  enable  the  estimate  “to  catch”  the 
theoretical  solution  in  its  confidence  interval. 
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Figure  5.35  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  for  the  estimate  of 
uperossing  of  peak-based  envelope  extrapolated  using  extreme  \  alue  distribution  vs.  threshold  for 

two  alternative  data  sets 


Figure  5.37  shows  results  of  the  averaging  teehnique  for  two  alternative  datasets. 
Both  theoretical  solutions  (the  rate  of  uperossing  of  the  theoretical  envelope  and  rate  of 
the  both-sides  crossing)  are  within  the  confidence  interval  of  the  extrapolated  estimate. 
Data  for  breaking  points  are  shown  in  Figure  5.37. 

a)  Rare  solution,  alternative  set  1 
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Figure  5.36  Extrapolated  estimate  of  eonditional  probability  that  the  proeess  will  exceed  the  level  of 
15  m  if  the  threshold  has  been  crossed  -  rare  solutions  (upper  plots:  a,  b)  and  complete  extrapolated 
estimate  (lower  plots:  c,  d)  for  two  alternative  data  sets  for  02=I5  ni.  Both  eases  use  extreme  value 
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b)  Rare  solution,  alternative  set  2 
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d)  Complete  solution,  alternative  set  2 


189 


a)  Alternative  set  1 
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b)  Alternative  set  2 

Extrapolated  upcrossing  rate 
of  peak-based  envelope  using 
_ Wei  bull  fit  of  maxima 


Theoretical  rate  of 
both-sides  crossing, 
breaking  point  18.5  m 


x 


— 


Upcrossing  rate  of 
theoretical 

_envelope,  breaking  Extrapolated  upcrossing 
point  IS  5  m  ratc  0p peak-based 

envelope  using  extreme 
value  distribution 


Figure  5.37  Level  15  m:  theoretical  solution  and  extrapolated  estimate  averaged  for  (a)  the  set  1 
thresholds  7. 5-9. 6  m;  the  distribution  for  the  threshold  9.6  m  was  fitted  with  33  points,  (b)  For  the  set 
2  range  is  7.5-9.6  m  with  30  points  for  the  threshold  9.6  m. 

Figure  5.34  as  well  as  Figure  5.37  also  shows  comparison  between  two  estimates 
based  on  the  Weibull  fit  (described  in  the  previous  subsection)  and  extreme  value 
distribution.  Both  confidence  intervals  have  quite  substantial  common  area,  which  means 
statistical  equality  of  the  two  estimates. 

The  tw  o  estimates  used  the  rare  solution  based  on  different  ways  of  using  Weibull 
distribution:  as  a  distribution  of  maxima  and  an  extreme  value  distribution.  The  fact  that 
both  of  these  ways  have  produced  the  same  solution  is  notable;  it  can  be  used  to  check 
the  estimates  against  each  other  when  theoretical  solutions  are  not  available. 

The  average  break  point  between  all  three  datasets  was  17.8  m  (based  on  the  ratc 
of  upcrossing  of  the  theoretical  envelope).  The  mean  time  before  such  an  event  is  about 
330  years.  Assuming  that  at  least  10  events  are  needed  to  get  a  statistical  estimate,  it  will 
take  about  3,300  years  worth  of  data  to  get  the  result.  The  EPOT  method  produced  this 
result  with  only  1 00  hours  of  data;  this  makes  the  data  reduction  factor  equal  to  290,500. 

5.4.Extrapolation  with  EPOT:  Zero-Speed  Case 

5.4.1.  Approximate  Theoretical  Solution  for  Zero-Speed  Case 

As  it  was  discussed  above,  there  is  no  exact  theoretical  solution  available  for  the 
upcrossing  of  the  peak-based  envelope  in  the  general  case.  Such  a  solution  is  only 
available  for  the  case  of  the  relatively  narrow-band  spectrum,  when  the  envelope 
becomes  a  slowly  changing  function  of  time  (in  comparison  with  the  process  itself)  and 
the  peak-based  envelope  becomes  a  relatively  close  approximation  of  the  theoretical 
envelope. 
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To  overcome  this  difficulty,  the  approximate  solution  was  proposed  in  one  of  the 
previous  subsections.  The  non-rare  solution  used  regression  to  express  dependence  of  the 
uperossing  rate  on  the  threshold,  see  formula  (5.45). 

It  is  also  assumed  that  the  large-values  of  the  peak-based  envelope  are  likely  to 
follow  a  Rayleigh  distribution.  This  assumption  can  be  partially  justified  as  the  absolute 
value  of  the  peaks  do  follow  a  truncated  Rayleigh  distribution  (5.32).  However,  the 
distribution  of  absolute  value  peaks  is  not  identical  to  the  distribution  of  peak-based 
envelope,  as  points  of  the  latter  are  calculated  with  linear  interpolation  between  the  peaks 
(see  Section  4).  This  assumption  needs  to  be  cheeked  by  the  goodness  of  fit  test  of 
maxima  of  the  peak  based  envelope.  Once  this  assumption  has  been  checked  and  found  to 
be  acceptable.  Formula  (5.40)  is  to  be  used  for  the  rare  solution.  Therefore  the  complete 
solution  can  be  formulated.  However,  it  is  only  an  approximation  This  means  that  an 
agreement  between  this  solution  and  the  extrapolation  does  not  validate  the  method, 
neither  the  disagreement  between  the  approximate  solution  and  the  extrapolation  would 
invalidate  it. 

It  still  makes  sense  to  see  how  two  other  theoretical  solutions  (the  rate  of 
uperossing  of  theoretical  envelope  and  theoretical  rate  of  the  both-side  crossings)  will 
compare  with  the  extrapolation  result. 

5.4.2.  Distribution  of  Maxima  of  the  Peak-Rased  Envelope 

A  sample  record  with  the  maxima  of  the  peak-based  envelope  is  shown  in  Figure 
5.38.  Comparing  to  the  similar  picture  for  the  following  wave  ease  in  Figure  5.18,  one 
can  see  that  the  envelope  is  no  longer  a  slowly  changing  process;  as  a  result,  the 
population  of  maxima  of  the  envelope  should  not  be  much  different  from  the  population 
of  absolute  values  of  peaks  of  the  process  itself. 
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Figure  5.38  Peak  based  envelope  (red)  and  its  maxima:  /ero-speed  case 


This  explains  applicability  of  the  truncated  Rayleigh  distribution,  as  shown  in 
Figure  5.39  (b),  while  the  Rayleigh  distribution  as  whole  remains  inapplicable  (see 
Figure  5.39  (a)).  Also,  this  applicability  can  be  used  to  justify  the  assumption  made 
earlier  for  using  formula  (5.40)  for  the  rare  solution. 
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However,  the  applicability  of  the  truncated  Rayleigh  distribution  was  judged 
based  on  the  Pearson  chi-square  goodness-of-fit  test,  which  did  not  reject  this  hypothesis 
for  a  particular  dataset.  So,  at  least,  this  applicability  needs  to  be  checked  for  all  three 
datasets  considered.  The  results  of  this  check  are  summarized  in  Table  15.  Note  that 
without  removal  of  a  single  outlier  in  the  last  bucket  of  the  histogram  for  the  Alternative 
Dataset  1,  the  goodness-of-fit  test  would  reject  the  hypothesis  for  all  starting  values  with 
an  exception  of  10.25  m. 


Figure  5.39  Distribution  of  maxima  of  the  peak  based  envelope  superimposed  with  Rayleigh 
distribution  (a),  truncated  Rayleigh  distribution.  Zero-speed  case 


Table  15.  Applicability  of  Truncated  Rayleigh  Distribution  for  Maxima  of  Peak-Based  Envelope  for 

the  Zero-Speed  Case 


Dataset 

Start 

value 

Number 
of  points 

Value  of  x2 

d 

P 

Comment 

Original 

7.10 

567 

20.5 

13 

0.0836 

Alternative  1 

6.40 

876 

27  1 

17 

0.0566 

Remove  an  outlier  in  the  last  bin 

Alternative  2 

6.44 

931 

24.6 

14 

0.0776 

5.4.3.  Extrapolation  with  the  Distribution  of  the  Peak-Based  Envelope 

As  it  was  noted  above,  there  is  no  exact  theoretical  solution  to  perform  a  correct 
comparison  with  extrapolated  estimate  of  peak-based  envelope  uperossing  of  the  zero 
speed  case.  There  is  one  approximate  solution  based  on  the  regression  formula  for  the 
non-rare  problem  and  two  theoretical  solutions  known  not  to  be  completely  applicable  in 
this  case.  These  solutions  are  the  theoretical  rate  of  the  both-sides  crossing  and 
uperossing  rate  of  the  theoretical  envelope.  The  comparison  may  yield  interesting 
information,  however,  strictly  speaking,  this  comparison  cannot  be  used  to  validate  or 
invalidate  the  method. 

Figure  5.40  shows  the  breakpoints  based  on  an  approximate  solution,  the 
theoretical  rate  of  the  both-sidcs  crossings,  and  the  rate  of  uperossing  of  theoretical 
envelope.  Surprisingly,  they  are  not  much  different,  with  the  lowest  point  at  about  13  m. 
A  similar  picture  can  be  seen  in  Figure  5.41  and  Figure  5.42,  except  from  7. 8-8. 6  m  in 
Figure  5.42. 

There  is  one  important  detail  of  how  the  breaking  point  was  calculated.  The  rate 
of  uperossing  of  the  theoretical  envelope  may  be  significantly  different  from  an 
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approximate  solution  as  the  latter  is  based  on  the  regression  formula  for  the  uperossing  of 
the  peak-based  envelope.  As  it  was  shown  in  Section  4,  the  rate  of  uperossing  for 
theoretical  and  peak-based  envelope  is  quite  different  for  the  zero-speed  case.  Therefore, 
it  may  be  expeeted  that  the  rate  of  the  theoretical  envelope  may  be  outside  of  the 
eonfidenee  interval  of  the  extrapolated  estimate,  especially  for  the  lower  threshold,  where 
eonfidenee  interval  is  relatively  narrow. 

This  is  exaetly  what  is  observed  in  Figure  5.43.  The  alternative  data  set  2  clearly 
illustrates  this  effect.  The  inset  of  Figure  5.43  zooms  in  the  initial  range  of  curves. 


Figure  5.40  Breakpoint  level  (the  le\el  below  which  the  extrapolation  is  still  good)  \s.  threshold  for 
the  extrapolation  based  on  fitted  distribution  of  maxima  for  zero-speed  case. 


Figure  5.41  Breakpoint  lex  el  (the  lex  el  below  which  the  extrapolation  is  still  good)  vs.  threshold  for 
the  extrapolation  based  on  fitted  distribution  of  maxima  for  zero-speed  case.  Alternatixe  data  set  I 
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Extrapolated  estimate  for  upcrossing  rate,  1/s 


Figure  5.42  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  vs.  threshold  for 
the  extrapolation  based  on  Fitted  distribution  of  maxima  for  zero-speed  case.  Alternative  data  set  2 


Figure  5.43  Extrapolated  estimate  of  upcrossing  rate  of  the  peak-based  envelope  with  confidence 
interval  as  a  function  crossing  level.  The  threshold  is  9  m,  227  peaks,  alternative  set  2 
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As  it  ean  be  clearly  seen  from  Figure  5.43,  the  curve  of  the  rate  of  the  upcrossing 
of  the  theoretical  envelope  starts  outside  of  the  confidence  interval  of  the  extrapolated 
estimate.  Then,  somewhere  around  13  m,  it  enters  the  eonfidenee  interval.  Then,  it  leaves 
the  confidence  interv  al  just  short  of  the  level  of  1 8  m. 

Behavior  of  the  theoretical  rate  of  the  both-sides  crossing  is  somewhat  similar  It 
also  may  start  outside  of  the  confidence  interval.  The  reason  is  that  several  both-sides 
crossing  events  may  be  covered  by  one  peak-based  envelope  upcrossing,  say,  for 
example,  if  two  neighboring  positive  peaks  happen  to  be  above  the  threshold  (see  Figure 
5.44).  That  is  why  the  theoretical  rate  of  the  both-sides  crossing  is  larger  than  the 
extrapolated  estimate  and  the  approximate  solution  for  relatively  lower  levels.  However, 
once  the  level  increases  the  situation,  similar  to  the  one  shown  in  Figure  5.44,  becomes 
very  rare.  Only  one  peak  at  a  time  has  a  chance  to  be  above  the  level.  This  makes  the 
both-sides  crossing  rates  converge  and  even  eross  the  approximate  solution  in  point  A 
from  Figure  5.43.  As  there  is  no  reason  why  the  theoretical  rate  of  the  both-sides  crossing 
should  be  lower  than  the  uperossing  rate  of  the  peak-based  envelope,  further  behavior  of 
the  curve  can  be  explained  by  an  approximate  nature  of  the  solution  that  used  the 
regression  formula. 

Behavior  of  theoretical  rate  of  the  both-sides  crossing  and  rate  of  the  theoretical 
envelope  was  compared  and  discussed  earlier,  see  Figure  5. 15. 


Figure  5.44  On  difference  between  envelope-base  peak  crossing  and  both  side  crossing 

Concluding  the  discussion,  both  the  theoretical  rate  of  the  both-sides  crossing  and 
the  upcrossing  rate  of  the  theoretical  envelope  may  take  values  outside  of  the  confidence 
interval  of  the  extrapolated  estimate  for  lower  thresholds.  However,  both  curves  may 
enter  the  confidence  interval  for  larger  values  of  the  threshold.  Therefore,  it  makes  sense 
not  to  start  the  search  for  the  breaking  point  from  the  very  beginning.  The  level  13.25  in 
was  used  as  an  initial  for  the  Figure  5.40  through  Figure  5.42. 

The  breaking  point  for  the  approximate  solution,  however,  was  searched  for 
starting  from  the  very  beginning.  This  made  a  difference  only  in  Figure  5.42  in  the  range 
of  7. 8-8. 6  m.  If  the  breakpoint  for  the  approximate  solution  is  searched  for  starting  at  the 
level  13.35  m,  like  for  other  solutions,  the  flat  segment  disappears,  see  Figure  5.45 

Figure  5.46  shows  how  the  approximate  solution  has  left  the  confidence  interval 
around  9.5  m  and  has  re-entered  it  around  1  1.5  m.  Possibly,  the  reason  of  such  behavior 
is  usage  of  the  regression  formula  in  the  approximate  solution  that  generally  increases  the 
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level  of  uncertainty.  Figure  5.46  shows  the  approximate  solution  that  stays  very  close  to 
the  upper  boundary  of  the  confidence  interval.  Also  this  problem  does  not  exist  for  higher 
thresholds.  This  needs  to  be  taken  in  to  account  when  choosing  the  range  of  averaging. 


Figure  5.45  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  vs.  threshold  for 
the  extrapolation  based  on  Fitted  distribution  of  maxima  for  zero-speed  case.  Alternative  data  set  2 


Figure  5.46  Extrapolated  estimate  vs.  approximate  solution  for  the  threshold  7.8  m  791  peaks, 

alternative  data  set  2 


1% 


5.4.4.  Averaged  Extrapolation  Based  on  Weibull  Fit  of  Maxima 


The  complete  extrapolated  solution  plotted  for  different  threshold  levels  (see 
Figure  5.47)  shows  some  spreading  of  the  extrapolated  estimate  around  the  theoretical 
solutions,  therefore  the  averaging  with  formulae  (3.46)  and  (3.47)  can  improve  accuracy 
of  the  estimate.  Figure  5.48  shows  the  averaged  estimates  for  the  different  levels  of  the 
original  dataset. 
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Figure  5.47  Statistical  extrapolation  of  the  uperossing  rate  of  peak-based  en\ elope  -  complete 
solution  for  the  level  of  13  ni  based  on  distribution  of  peak-based  en\ elope 


Figure  5.48  Averaged  estimate  of  rate  of  uperossing  of  the  peak-based  envelope  extrapolated  using 
Weibull  fit  of  maxima.  Insert  shows  the  level  of  13  m 
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Values  of  breakpoints  of  averaged  estimates  for  all  datasets  are  shown  in  Table  16.  The 
averaging  was  done  with  the  following  rule:  the  highest  threshold  should  have  at  least  30 
points.  The  lowest  threshold  is  half-way  back  from  the  highest  threshold  to  the  level 
where  the  Poisson  flow  is  still  applicable.  This  is  an  empirical  rule  based  on  the 
observation  that  the  higher  thresholds  tend  to  perform  better  for  the  extrapolation  based 
on  the  Weibull  fit  of  maxima. 


Table  16.  Breakpoints  for  Averaged  Estimates  based  on  \\  eibull  Fit  of  Maxima 


Dataset 

Breaking  point  m 

Approximate  solution 

Theoretical  rate  of  both- 
sides  crossing 

Uperossing  rate  of 
theoretical  envelope 

Original 

20.3 

19.3 

20.3 

Alternative  Set  1 

15.75 

15.25 

16.25 

Alternative  Set  2 

19.3 

18.8 

19.8 

5.4.5.  Extrapolation  Based  on  Extreme  Value 

The  procedure  applied  for  extrapolation  based  on  the  extreme  value  distribution 
for  the  zero-speed  case  is  exactly  the  same  as  it  was  for  the  following  wave  case.  Figure 
5.49  illustrates  that  while  collecting  data,  only  one  point  in  Window  I  was  collected  as 
the  maximum  value,  despite  that  there  was  one  more  peak  exceeding  the  sample 
threshold  of  9  m:  there  is  only  one  point  in  Window  2  that  is  above  the  threshold. 

X.  m  WmHnu/  1  ^  indow  2 
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Figure  5.49  Collecting  data  for  extreme  value  distribution,  threshold  9  m,  zero  speed  case 

Since  there  is  no  exact  theoretical  solution  the  results  may  be  compared  with  the 
same  three  solutions  used  for  the  extrapolation  based  on  the  Weibull  fit  of  the  maxima. 
Besides  the  approximate  solution  based  on  regression  formula  for  the  non-rarc  problem, 
these  solutions  include  the  theoretical  rate  of  the  both-sides  crossing  and  uperossing  rate 
of  theoretical  envelope. 

As  it  can  be  seen  from  the  discussion  in  the  previous  subsection,  the  difference 
between  these  solutions  is  not  that  large,  especially  in  comparison  with  the  width  of  the 
confidence  interval  for  the  extrapolated  estimate.  Some  difference  was  found  for 
relatively  low  values  of  the  threshold  where  two  theoretical  solutions  could  go  outside  of 
the  confidence  interval  and  this  has  to  be  accounted  for  while  calculating  the  values  of 
breakpoints. 

The  breakpoint  values  of  the  extrapolation  based  on  the  extreme  value  distribution 
are  shown  in  Figure  5.50  through  Figure  5.52  for  all  three  data  sets. 


198 


Figure  5.50  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  vs.  threshold  for 
the  extrapolation  based  on  extreme  value  distribution  for  zero-speed  case.  Original  data  set 


Figure  5.51  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  vs.  threshold  for 
the  extrapolation  based  on  extreme  value  distribution  for  zero-speed  case.  Alternative  data  set  1 


Figure  5.52  Breakpoint  level  (the  level  below  which  the  extrapolation  is  still  good)  vs.  threshold  for 
the  extrapolation  based  on  extreme  value  distribution  for  zero-speed  case.  Alternative  data  set  2 
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Behavior  of  all  three  solutions  in  terms  of  the  breakpoints  are  pretty  similar  and 
do  not  require  any  additional  comments  with  the  exception  of  Figure  5.52  where  results 
for  Alternative  data  set  2  are  shown.  All  of  the  curves  have  a  flat  segment  in  the  range  of 
thresholds  7. 5-8.2  m.  This  is  an  area  where  relatively  poor  performance  was  observed. 
The  approximate  solution  left  the  confidence  interval  around  the  1 1  m  level  and  the  two 
other  solutions  never  entered  the  confidence  interval  at  all.  The  rest  of  the  threshold 
values  have  shown  normal  performance. 

The  averaging  procedure  was  applied  for  all  of  the  thresholds  while  the  number  of 
data  points  was  above  30.  The  result  for  the  original  data  set  is  shown  in  Figure  5.53.  As 
one  can  see  from  this  figure,  all  three  solutions  stay  within  the  confidence  interval  until 
the  level  of  19.5  m  was  reached.  The  inset  in  Figure  5.53  shows  the  comparison  between 
theoretical  solutions  and  both  extrapolated  estimates  for  the  level  of  13  m. 

The  results  on  the  breakpoints  of  the  averaged  extrapolated  estimates  based  on  the 
extreme  value  distribution  are  shown  in  Table  1 7. 


Figure  5.53  Axeraged  estimate  of  rate  of  uperossing  of  the  peak-based  envelope  extrapolated  using 
extreme  value  distribution.  Insert  shows  the  level  of  13  m 
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Table  17.  Break 

points  for  Averaged  Estimates  based  on  Extreme  Value  Distribution 

Dataset 

Breaking  point  m 

Approximate  solution 

Theoretical  rate  of  both- 
sides  crossing 

Uperossing  rate  of 
theoretical  envelope 

Original 

20.3 

19.3 

20.3 

Alternative  Set  1 

13.75 

1  15.25 

14.5 

Alternative  Set  2 

19.25 

1S.75 

19.75 

The  lowest  breakpoint  in  Table  17  is  13.75  m.  It  corresponds  to  the  mean  time  to 
event  of  1 1.89  days.  So  in  order  to  get  a  statistically  credible  estimate,  say,  10  c\cnts  are 
needed.  These  10  events  may  require  almost  120  days  wrorth  of  data.  At  the  same  time 
the  method  allowed  us  to  get  the  estimate  with  only  about  4  days  of  data  (100  hours).  So 
120  days,  or  2880  hours,  was  reduced  to  100  hours  by  the  use  of  the  method.  That  said, 
the  efficiency  of  the  method  in  the  worst  case  resulted  in  a  data  reduction  factor  of  28. 
Averaging  the  breakpoint  between  all  three  cases  brings  it  to  17.77  m  and  the  data 
reduction  factor  up  to  71,260.  These  data,  however  has  to  be  considered  preliminary,  as 
more  performance  checks  are  expected. 

5.5.  Summary 

The  principal  objective  of  this  work  is  to  find  a  practical  solution  for  the 
probability  of  a  partial  stability  failure  during  a  given  time.  The  goal  of  this  section  is  to 
find  out  how  to  use  the  Peak-Over-the-Threshold  (POT)  method  for  partial  stability  in  the 
form  of  a  large  roll  event. 

The  large  roll  event  is  equally  dangerous  on  either  side  of  a  ship.  Therefore  it 
should  be  described  as  a  random  event  of  “both-sides  crossing’*,  a  combining  an 
uperossing  of  a  level  on  the  positive  side  or  downcrossing  of  a  level  on  the  negative  side. 
If  the  boundary  is  the  same  for  both  sides  and  the  process  has  symmetric  distribution,  the 
rate  of  the  both-sides  crossing  is  equal  to  twice  of  the  rate  of  the  uperossing.  The  Poisson 
flow  assumption  is  only  applicable  for  a  relatively  high  level  of  both-sides  crossings,  as 
uperossing  and  downcrossing  events  occurring  during  one  period  are  not  independent. 

An  uperossing  of  the  envelope  of  the  process  is  a  random  event,  theoretically 
equivalent  to  the  both-sides  crossing.  Poisson  flow  is  applicable  to  the  envelope 
uperossing.  Also,  the  envelope  uperossing  is  equivalent  to  a  random  event  that  an 
absolute  value  of  a  peak  has  exceeded  that  level.  Since  the  peaks  of  the  envelope  are 
used,  the  new  version  of  the  method  is  call  “Envelope  Peaks-Over-the-Threshold” 
(EPOT). 

The  problem  of  the  uperossing  of  the  peak-based  envelope  is  that  it  does  not  have 
a  closed-form  solution  for  a  generic  spectrum  even  for  a  normal  process.  Nevertheless 
such  a  solution  is  needed  to  compare  results  of  sample  calculations.  If  a  spectrum  is 
narrow,  then  the  peak-based  envelope  is  a  close  approximation  for  the  theoretical 
envelope.  As  a  result,  the  formula  for  the  uperossing  of  the  theoretical  envelope  of  a 
normal  process  can  be  used.  Therefore,  only  the  example  with  a  narrow-band  spectrum 
can  be  used  to  complete  the  theoretical  checking  of  the  EPOT  method. 
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To  check  the  method  using  the  example  of  a  generic  spectrum,  an  approximate 
solution  is  needed.  Following  the  principle  of  separation,  such  a  solution  can  be  presented 
in  the  form  of  non-rare  and  rare  sub-problems.  The  non-rare  sub-problem  is  just  an 
upcrossing  of  the  peak-based  envelope.  It  can  be  approximated  by  fitting  a  regression 
formula  to  the  statistics  of  upcrossings.  The  solution  of  rare  sub-problems  is  the 
probability  that  a  process  will  exceed  a  given  level  if  it  has  crossed  a  threshold  below  that 
level.  This  solution  can  be  developed  using  the  fact  that  absolute  values  of  peaks  of  a 
normal  process  follow  truncated  Rayleigh  distributions  starting  at  a  certain  value.  This 
value  depends  on  a  bandwith  of  the  spectrum. 

Finally,  the  EPOT  method  was  checked  against  both  examples  and  shows  quite 
satisfactory  performance  with  average  data  reduction  factors  of  290,500  and  71,260 
respectively. 
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6.  Algorithm  Implementation 

This  section  describes  the  reference  implementation  of  the  EPOT  algorithm.  The 
reference  implementation  is  coded  in  Matlab  (requires  Matlab  R2008a  or  later). 

6.1. Envelope  Construction 

6. 1. 1.  Peak  Definition  for  the  Process 

Two  definitions  of  a  peak  are  implemented  and  may  be  used  to  define  the 
envelope,  then  to  define  the  peaks  of  the  envelope.  The  first  definition,  referred  to  as  a 
“zero-crossing  peak"  (ZC),  defines  a  peak  as  the  maximum  value  between  a  zero  up 
crossing  and  a  zero  down  crossing.  Negative  peaks  (troughs)  are  computed  as  the 
minimum  value  between  a  zero  down  crossing  and  a  zero  uperossing  and  then  reflected 
about  zero. 

Alternatively,  peaks  may  be  defined  as  local  maxima.  In  this  formulation,  a  particular 
point  is  considered  to  be  a  peak  if  it  is  greater  than  the  three  pevious  and  three  following 
points. 

The  zero-crossing  peak  method  is  used  for  two  reasons.  First,  the  zero-crossing 
method  is  much  more  reliable  on  signals  that  have  noise.  Several  assumptions  (related  to 
noise  frequency,  the  motion  frequency,  sampling  rate,  etc.)  need  to  be  made  (or  filtering 
employed)  to  find  the  peaks  on  a  noisy  signal.  Second,  the  zero-crossing  method 
removes  secondary  peaks,  which  are  not  of  interest  to  us. 

6. 1.2.  Envelope  Definition 

To  construct  the  composite  peak-based  envelope,  the  negative  peaks  are  reflected 
about  the  reference  level.  For  a  proeess  such  as  the  rolling  of  a  ship,  this  reference  level  is 
zero,  since  deviations  from  upright  are  what  are  important,  not  deviations  from  the  mean 
value.  For  cases  where  deviations  from  the  mean  value  are  important,  the  signal  should 
be  de-meaned  (and  the  levels  of  interest  should  be  given  relative  to  the  mean  of  the 
process).  The  time  history  of  the  envelope  is  constructed  through  linear  interpolation 
between  the  peaks  at  the  sampling  frequency  of  the  input  signal. 
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Figure  6.1.  Sample  Envelope  \\  ith  Linear  Interpolation 
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6. 1.3.  Peak  Definition  for  the  Envelope 

The  peaks  of  the  envelope  are  defined  in  a  similar  manner  to  the  peaks  of  the  signal. 
Instead  of  utilizing  zero  crossings,  mean  crossings  of  the  envelope  are  used.  The  mean  of 
the  envelope  is  computed  using  the  interpolated  time  history  and  the  envelope  peaks  are 
the  maxima  and  minima  between  mean  crossings.  Only  the  peaks  above  the  envelope 
mean  are  used  further  in  the  algorithm. 

6.2.  Candidate  Threshold  Selection 

Initially  sixteen  thresholds  are  defined.  These  thresholds  are  linearly  spaced  between 
the  mean  value  of  the  envelope  time  history'  and  an  upper  threshold.  The  upper  threshold 
is  defined  by  the  requirement  that  we  have  at  least  30  points  to  fit  a  distribution.  The 
peaks  of  the  envelope  are  sorted  in  descending  order  and  the  31st  entry  in  the  sorted  list  is 
the  upper  threshold. 

The  candidate  thresholds  are  analyzed  for  the  applicability  of  Poisson  Flow  to 
exceedanees  of  the  thresholds  (using  the  envelope  peak  data).  The  tests  for  Poisson  Flow 
applicability  are  discussed  in  Section  6.3.  The  lowest  threshold  that  passes  both  tests  for 
Poisson  Flow  applicability  is  taken  as  the  lowest  threshold  for  use  in  the  statistical 
extrapolation.  A  new  set  of  eight  thresholds  is  linearly  distributed  between  this  lower 
threshold  and  the  upper  threshold  described  in  the  above  paragraph. 

6.3.  Analysis  of  Poisson  Flow  Applicability 

Sections  1.3.5  and  5.1.3  describes  two  methods  for  assessing  the  applicability  of 
Poisson  Flow  for  a  given  threshold,  chi-squared  Pearson  Test  and  a  Kolmogorov- 
Smimov  Test  (KS  Test). 

6.3. 1.  Pearson  %  Test 

In  a  Pearson  chi-squared  test,  the  hypothesis  is  posited  that  the  number  of  events 
in  a  certain  time  span  is  distributed  via  a  Poisson  distribution.  To  carry  out  this  test,  the 
envelopes  derived  from  the  time  histories  are  first  concatenated  together.  The  peaks  of 
the  envelopes  have  already  been  found,  so  there  are  no  peaks  at  the  concatenation  point. 
A  window  size  (time  span)  is  then  defined  to  count  events  (peaks  over  the  threshold). 
This  time  span  is  computed  as: 


WindowSize—  f'pu 
N  POT 


(6.1) 


Where  Tsampie  is  the  total  duration  of  the  concatenated  envelopes  and  Npot  is  the 
total  number  of  peaks  above  the  threshold  in  question.  Defining  the  window  size  in  this 
way  generally  limits  the  maximum  number  of  events  in  a  given  window  to  about  4.  This 
essentially  sets  the  intensity  (the  single  parameter)  of  the  Poisson  distribution  to  1 .0.  The 
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concatenated  envelope  is  then  divided  into  sections  each  with  a  length  of  the  window 
size.  The  number  of  POT  in  eaeh  window  is  eounted.  A  Poisson  distribution  is  fit  to  this 
data  set  using  the  maximum  likelihood  method.  A  good  starting  point  for  the  intensity  of 
the  distribution  in  the  fitting  process  is  the  mean  number  of  events  per  window.  This 
process  is  repeated  for  window  lengths  of  0.8  and  1.2  times  the  original  length.  The 
goodness  of  fit  metrie  is  then  averaged  between  the  three  window  spans.  This  is  done 
sinee  there  is  some  variability  of  the  results  with  window  size  due  to  the  random  process 
and  the  finite  reeord  length. 

See  Figure  6.2  for  a  sample  PDF  used  for  the  ehi-squared  lest.  If  the  averaged 
goodness  of  fit  metrie  is  above  the  aeeepted  significance  level  (0.05),  then  the 
distribution  fit  is  aeeepted  and  one  of  the  two  criteria  for  Poisson  Flow  at  this  threshold  is 
satisfied. 
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Figure  6.2.  Sample  Distribution  Fit  to  Number  of  Fsents  in  Time  Y\  indoss 


6.3.2.  Kolmogorov  Smirnov  (K-S)  Test 

For  the  K-S  Test,  the  empirical  cumulative  density  function  (CDF)  of  time  with 
no  event  derived  from  the  time  history,  is  compared  to  the  CDF  that  is  computed  using 
the  exponential  distribution  and  a  statistically  calculated  mean  crossing  rate  (see  Section 
6.4).  The  empirical  CDF  is  derived  from  the  time  history  as  described  in  Section  1.3.4. 

The  smallest  time  window  analyzed  is  based  on  the  decay  of  the  autocorrelation 
function  of  the  process.  The  peaks  of  the  autocorrelation  function  are  found  (using  the 
zero  crossing  method)  and  the  first  peak  with  a  value  below  0.05  is  found.  The  smallest 
time  window  is  set  to  the  time  of  this  peak.  If  multiple  reeords  are  available  for  a  given 
condition,  then  the  autocorrelation  functions  of  the  reeords  are  averaged  before  the  peak 
seareh  is  performed.  For  short  duration  reeords,  the  computed  autocorrelation  functions 
ean  behave  poorly  (even  after  averaging);  they  do  not  deeay  as  expected.  The  lack  of 
decay  is  due  almost  entirely  to  a  defieieney  in  the  amount  of  data.  If  no  peaks  of  the 
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averaged  autocorrelation  function  are  found  to  be  below  0.05,  then  an  alternative 
approach  is  taken.  In  this  ease,  the  peak  based  envelope  of  the  autocorrelation  function  is 
computed  and  the  time  of  first  local  minimum  of  the  envelope  is  taken  as  the  decay  time. 

Each  window  is  examined  to  see  if  there  is  at  least  one  event  in  it.  An  auxiliary 
variable  pi  is  constructed  that  is  equal  to  1  if  there  is  at  least  one  event  in  the  window  and 
zero  if  there  is  no  event  in  the  window.  The  mean  of  pi  is  computed  for  each  record.  This 
value  is  the  probability  that  at  least  one  event  will  occur  within  the  window.  The 
w  indows  are  then  grouped  in  sets  of  two  and  the  process  is  reapeated  for  a  window  size 
twice  as  big.  The  original  windows  are  then  grouped  in  sets  of  three,  etc.  This  combining 
of  windows  happens  until  one  window  is  produced.  For  the  case  of  multiple  records,  the 
probabilities  are  averaged  across  the  records.  This  process  creates  the  empirical 
cumulative  density  function  (EDF)  for  the  probability  of  at  least  one  event  occurring. 

The  theoretical  CDF  is  then  computed  using  the  exponential  distribution,  as  in 
equation  (1.38).  The  K-S  Test  is  carried  out  between  these  two  density  functions  using 
equations  (1.126),  (1.127),  and  (1.129).  Because  of  the  limited  data  set,  the  EDF  can 
become  unrealiable  for  high  values  of  probability.  For  this  reason  an  upper  limit  is  set, 
above  which  the  comparison,  using  equation(l  126),  neglects  the  data.  The  upper  limit 
on  the  EDF  is  typically  set  at  0.65  (in  terms  of  probability).  The  goodness  of  fit  is  only 
cheeked  for  values  below  this. 


6.4.Calculation  of  Threshold  Exceedance  Rates 

6.4. 1.  Esimate  of  Threshold  Exceedance  ( Upcorssing)  Rate 

The  threshold  exceedance  (uperossing)  rates  are  computed  statistically.  For  the 
EPOT  method  the  threshold  crossing  rate  is  given  by: 

N 

X  =  —^  (6.2) 

T 


Where  Npor  is  the  number  of  peaks  above  the  threshold  and  T  is  the  total  duration  of  the 
sample.  Equation  (6.2)  is  derived  from  equation  (3.19). 

The  threshold  crossing  rate  is  not  strictly  needed  when  the  extreme  value 
distribution  is  used  for  the  rare  problem  (EVPOT  method).  It  may,  however,  be 
computed  as  follows: 


X  = - In 

T 


1  _  ^  MOT 


V  **  Maxima  J 


(6.3) 


Where  Nmot  is  the  number  of  maxima  over  the  threshold,  N, Maxima  is  the  total  number  of 
maxima,  and  T  is  again  the  total  sample  time.  Equation  (6.3)  is  equivalent  to  equation 
(2.60).  Here,  the  quantity  (N mot /  N Maxima)  is  the  probability  that  a  given  maximum  value 
will  be  above  the  threshold  and  the  quantity  (/-  N mot  /  N Maxima  )  is  the  empirical  CDF  for 
the  sample  maxima. 
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6.4.2.  Esimate  of  Confidence  Interval  for  Threshold  Exceedance  Rate 

The  confidence  interval  on  X  is  computed  by  first  computing  the  confidence 
interval  for  Nporo r  A^v/or-  Since  each  of  these  quantities  may  be  considered  the  result  of 
a  set  of  Bernoulli  trials,  we  may  use  the  binomial  distribution  to  compute  the  confidence 
interval. 


The  binomial  distribution  has  two  parameters,  commonly  taken  as  n  and  p.  n  is 
the  number  of  trials  (in  our  case  the  number  of  time  steps)  and  p  is  the  probability  that 
any  particular  time  step  is  a  peak  (or  extremum)  over  the  threshold,  p  is  given  simply  by: 


P  = 


N 


POT 


n 


(6.4) 


The  upper  and  lower  bound  of  the  confidence  interval  of  the  estimate  of  A  are 
then  given  by: 


(6.5) 


Where  P  is  the  complimentary  to  confidenccprobability,  Q  is  a  function  invere  to  CD! 

If  n  is  sufficiently  large  (above  200),  then  a  normal  distribution  may  be 
substituted  for  the  binomial  distribution,  since  the  factorials  for  the  binomial  distribution 
can  be  difficult  to  compute  for  large  n.  In  this  case  one  would  use  the  mean  and  variance 
of  the  binomial  distribution  (u  p  and  np  (l-p).  respectively)  as  the  parameters  of  the 
normal  distribution.  Details  of  theoretical  background  is  given  in  Section  1.2. 


6.5.Distribution  Fits  to  Peaks/Maxima  Over  the  Threshold 

A  two-parameter  Weibull  distribution  is  fit  to  the  peaks  and  maxima  over  the 
threshold.  Application  of  the  method  of  maximum  likelihood  estimation  (MLB)  was 
discussed  in  Section  2.1.4.  However,  independent  review  of  this  work  suggested  that, 
when  fitting  a  Weibull  distribution  with  a  shape  parameter  close  to  1.0,  the  MLE  method 
is  known  to  have  convergance  issues.  In  general,  the  peak  and  maxima  data  being  fit 
have  this  character.  It  was  suggested  that  a  least  squares  fitting  approach  may  be  more 
appropriate.  For  this  reason  the  distribution  is  fit  using  a  least  squares  fit  to  the 
empirically  derived  cumulative  distribution  function  (EDF).  The  EDF  is  derived  by 
ordering  the  data  from  smallest  to  largest.  The  EDF  of  a  given  value  located  at  position  / 
in  the  ordered  set  is  computed  as: 


EDF,  = 


/ 


n 


samples 


+  1 


(6.6) 


Where  i  is  the  index  (starting  at  I )  of  the  value  in  the  ordered  set  and  n  is  the  number  of 
peaks  or  maxima  over  the  threshold  in  the  data  set.  A  minimization  of  the  squared  error 
between  the  EDF  and  computed  CDF  is  then  performed  by  adjusting  the  Weibull 
distribution  parameters. 
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6.6. Calculation  of  Exceedance  Rates  for  Lev  els  of  Interest 


6. 6. 1.  Esimate  of  Level  Exceedance  Rates 

Finally,  the  exeeedance  rates  for  the  levels  of  interest  are  computed.  The  ealeulation 
is  comprised  of  the  threshold  crossing  rate  and  the  distribution  fit  to  the  peaks/maxima 
over  the  threshold.  For  the  rare  problem  usng  distribution  fit  (details  are  given  in  3.3.1), 
for  an  arbitrary  level,  a:,  is  given  by: 


X2  =  X,  ■  (l  -  Fpot (x  <  </,)) 


(6.7) 


Where  X]  is  the  threshold  crossing  rate  and  Fpot  is  the  CDF  for  the  distribution  of 
envelope  peaks  over  the  threshold. 

When  the  extreme  value  distribution  is  used  for  the  rare  problem  (see  subsection 
3.3.3),  we  first  compute  the  probability  that  a  given  maximum  value  is  over  the  threshold: 


(6.8) 


Maxima 


Where  N Maxima  >s  the  total  number  of  maxima  and  Nmot  is  the  number  of  maxima  over  the 
threshold.  Defining  pmot  allows  easier  definition  of  the  eonfidenee  interval  on  X?  in  the 
subsequent  section.  The  exceedanee  rate,  X2,  is  then  given  by: 


X2  ln(l  p mot  '  (1  ^mot 3*  <  ))) 


(6.9) 


Where  Fuor  is  the  CDF  of  the  distribution  of  maxima  and  T  is  the  total  duration  of  all 
sample  data. 

6.6.2.  Esimate  of  Confidence  Interval  for  Level  Exceedance  Rates 

For  the  rare  problem  using  the  simple  distribution  fit  for  peaks,  the  eonfidenee 
interval  on  the  level  exeeedanee  rates  is  a  composite  value  of  the  confidence  intervals  for 
the  threshold  crossing  rate  and  the  distribution  fit. 


(6.10) 


For  the  rare  problem  using  extreme  value  distribution,  we  first  define  the 
eonfidenee  interval  on  Pmot-  This  is  done  using  equation  (6.5),  where  n  is  the  total 
number  of  maxima  and  p  is  puor-  The  confidence  interval  on  the  exeeedanee  rate  is  then 
given  by: 


(6.11) 
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6.6.3.  A  ver aging  of  Results  for  AH  Thresholds 

As  discussed  in  section  6.2,  the  level  exceedance  rates  are  computed  using  eight 
different  thresholds.  After  computing  the  level  exeeedanee  rates  using  each  threshold, 
the  results  for  all  thresholds  are  averaged  together.  This  is  actually  done  by  averaging  the 
logarithm  of  the  crossing  rate  estimates  as  follows: 


''Thresholds 


>W  =  10 


^  ^  Thresholds 


2]  mm 


(6.12) 


Where  n Threshold*  is  the  number  of  thresholds  and  A,  is  the  estimate  of  the  level  crossing 
rate  for  threshold  /. 

This  log  averaging  method  is  used  beeause  we  are  generally  interested  in  the  order  of 
magnitude  of  the  level  exeedanee  rate  estimates.  It  is  possible  that  one  of  the  estimates  is 
several  orders  of  magnitudes  larger  than  the  others.  In  this  case  the  order  of  magnitude  of 
the  result  of  straight  averaging  is  the  order  of  magnitude  of  the  outlier  divided  by  the 
number  of  estimates  (eight  in  our  case).  That  is,  the  presence  of  the  outlier  ean  greatly 
skew  the  order  of  magnitude  of  the  result  if  straight  averaging  is  used. 

6.7.  Summary 

The  algorithm  of  EPOT  (as  implemented)  consists  of  the  follow  ing  steps: 

1 .  Search  for  zero-erossing  peaks. 

2.  Evaluate  peak-based  envelope  by  refleeting  negative  peaks  and  using  linear 
interpolation. 

3.  Calculation  of  the  mean  level  of  the  envelope  and  searching  for  the  mean¬ 
crossing  peaks  of  the  envelope. 

4.  Define  1 6  thresholds;  the  upper  threshold  must  have  at  least  30  peaks  of  the 
envelope  above  it;  the  rest  of  the  thresholds  arc  linearly  spaced  between  the 
upper  threshold  and  the  mean  of  the  envelope. 

5.  Check  Poisson  flow  applicability  for  each  threshold  with  both  Pearson  chi- 
squared  test  and  Kholmogorov-Smimov  test.  The  thresholds  where  both  tests 
that  passed  are  retained. 

6.  Estimate  the  threshold  exceedance  (uperossing)  rate  and  its  confidence 
interval  for  eaeh  threshold. 

7.  Evaluate  empirical  cumulative  distribution  function  for  peaks  of  the  envelope, 
exceeding  each  threshold;  fit  two-parameter  Wcibull  distribution  with  the 
least  squares  method.  Calculate  boundaries  of  eonfidenee  interval  for  the  fit. 

8.  Fit  the  two-parameter  Weilbull  distribution  as  an  extreme  value  distribution 
using  specified  time-window  for  peaks  of  the  envelope  exceeding  each 
threshold.  Calculate  boundaries  of  confidence  interval  for  the  fit. 

9.  Calculate  exeeedanee  (uperossing)  rate  for  the  level  of  interest  using  results 


209 


for  each  threshold;  calculate  boundaries  of  confidence  interval  for  each  value 
of  the  exceedance  rate. 

10.  Find  the  average  exceedance  rate  over  all  the  threshold  and  boundaries  of  its 
confidence  interval.  This  is  the  final  result. 
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7.  Concluding  Comments 

7.1.  The  Problem 

Given  a  time  history  of  the  response  of  a  nonlinear  dynamieal  system;  the  probability 
of  exceeding  a  given  level  during  a  given  time  is  to  be  found.  It  is  understood  that  there 
are  no  (or  statistically  insignificant  number)  of  observations  that  exceed  this  level. 
Therefore  the  formulated  problem  implies  statistical  extrapolation. 

7. 2.  The  Approach 

By  the  very  essenee  of  any  extrapolation  method,  it  is  a  judgment  on  data  outside  of 
observed  range,  but  based  on  the  data  within  the  observed  range.  Ship  motions  data  are 
statistically  dominated  by  relatively  small  values  laying  in  the  linear  range;  therefore,  an 
attempt  to  use  all  the  data  for  statistical  extrapolation,  in  faet,  leads  to  prediction  based  on 
a  linear  assumption.  To  avoid  “unintentional  lineanerization”  of  the  problem,  only  large- 
value  data  that  eontain  information  of  nonlinearity  of  the  dynamical  system  should  be 
used  for  extrapolation.  The  methods  that  use  only  the  data  above  a  eertain  threshold  are 
known  as  “Peak-over-Threshold”  (POT)  methods. 

The  POT  method  ean  be  considered  as  an  implementaion  of  the  principle  of 
separation,  when  the  problem  of  the  estimate  of  a  probability  of  rare  event  is  divided  in 
two:  non-rare  and  rare.  The  non-rare  problem  is  meant  to  be  solvable  with  conventional 
statistical  method.  In  the  ease  of  the  problem  being  considered,  the  non-rare  is  an 
estimate  of  exceedanee  (uperossing)  for  a  threshold  that  separates  small-valued  data  in 
the  linear  range  from  the  data  where  influence  of  nonlinearity  may  be  considered  as 
“significant".  The  rare  problem  is  aetual  statistical  extrapolation  using  only  the  data 
above  the  threshold. 

7.3.  The  Study 

As  the  probability  of  exeeedanee  is  dependent  on  time  of  exposure,  the  relation 
between  the  probabity  and  the  time  was  the  first  subject  of  this  study.  It  was  concluded 
that  Poisson  flow  should  be  used  to  relate  probability  and  the  time  of  exposure. 
Application  of  the  Poisson  flow  allows  use  of  the  exponential  distribution  for  time 
before/between  the  events  of  exeeedanee/uperossing  and  requires  that  events  must  be 
independent.  Pearson-ehi-square  and  Kholmogorov-Smimov  goodness-of-fit  test  ean  be 
used  to  eheek  applicability  of  Poisson  flow. 

The  extreme  value  theory  was  and  is  considered  as  the  main  tool  to  address  the 
problem  of  evaluation  of  the  probability  of  exeeedanee  with  statistical  extrapolation.  In 
order  to  relate  this  study  with  other  work  in  the  field,  the  relation  between  extreme  value 
distribution  and  time  was  examined.  It  was  found  that  the  eorrect  interpretation  of 
extreme  value  distribution  is  inherenrly  related  with  a  eertain  time  duration.  Extreme 
value  distribution  describes  the  behavior  of  the  largest  value  observed  during  a  given 
time. 


Further  consideration  was  focused  on  the  properties  of  peaks.  Obviously,  if  peak  of 
the  process  exceeds  the  level,  the  process  had  exceeded  the  level.  At  the  same  time,  the 
peak  data  are  easier  to  work  with  as  they  do  not  depend  on  sampling  rate;  although 
consecutive  peaks  are  not  independent,  the  correlation  between  them  is  less  in 
comparison  with  two  consecutive  values  of  the  time  history. 

The  rare  problem  can  be  formulated  in  two  ways,  that  involve  fitting  the  two- 
parameter  Weibull  distribution.  However,  if  the  Weibull  distribution  is  fit  to  all  the  peaks 
above  the  threshold,  it  is  used  to  smooth  the  histogram.  The  second  way  is  based  on  the 
properties  of  the  Weibull  formula  as  one  of  three  extreme  value  distributions  when  only 
the  largest  peak  observed  during  a  time  window  is  used  for  the  fit. 

Assumption  of  independence  of  consecutive  exceedances  (uperossings)  may  turn  out 
to  be  over-restrictive  for  certain  types  of  practical  applications.  One  of  them  is  analysis  of 
motions  in  stem  quartering  seas  while  the  ship  has  significant  forward  speed.  The 
encounter  spectrum  becomes  quite  narrow  due  to  Doppler  effect.  As  a  result,  motion 
response  becomes  highly  clustered;  satisfying  the  independence  clause  may  become  non¬ 
trivial.  Another  important  practical  consideration  is  when  exceedance  of  the  level  on 
both  sides  is  an  objective.  In  the  latter  case  the  assumption  of  independence  generally  is 
not  applicable,  with  the  exception  of  a  few  very  specific  cases.  If  a  ship  had  a  large  roll 
angle  on  one  side,  then  it  is  very  likely  to  have  a  large  excursion  on  the  other  side  as  the 
autocorrelation  function  stays  fairly  substantial  after  just  half  a  period  of  the  motions. 
For  such  applications,  it  makes  sense  to  work  with  the  envelope  of  the  process  rather  than 
with  the  time  history  of  the  process  itself. 

It  is  very  efficient  to  use  an  envelope  with  a  narrow  band  process  as  the  envelope 
changes  significantly  slower  than  the  process  itself.  For  a  more  general  case,  piecewise 
linear  or  peak-based  envelope  (linear  interpolation  between  the  absolute  values  of  local  or 
zero-crossing  peaks)  is  found  to  be  more  robust.  It  was  shown  that  statsistical 
extrapolation  is  based  on  the  envelope  peak  over  the  threshold.  Also,  it  was  shown  that 
the  procedure  for  the  envelope  peaks  docs  not  differ  much  from  the  procedure  with  the 
peaks  of  the  time  history  of  the  process. 

A  numerical  example  was  used  throughout  the  study.  The  data  set  for  the  numerical 
example  consisted  from  200  time  histories  of  wave  elevations.  Each  time  history  was  30 
minutes  long  and  was  reconstructed  from  Bretschneider  spectra  with  a  Fourier  scries. 
Distribution  of  these  wave  elevations  is  normal,  so  the  probability  of  exceedance  during  a 
given  time  is  known  from  uperossing  theory  in  closed  form.  Application  of  the  EPOT 
method  to  this  data  set  shows  quite  satisfactory  performance  with  average  data  reduction 
factors  from  71,260  to  290,500  (the  factor  of  how  much  more  data  would  be  needed  to 
get  the  same  result  directly  from  statistics). 

7.4. The  Outcome 

As  a  result  of  the  work  described  in  this  report,  the  EPOT  procedure  was  developed  , 
justified  and  implemented.  The  procedure  requires  input  of  time  histories  that  can  be 
both  numerical  or  experimcmnatal  origin.  The  output  is  an  estimate  of  exceedance 
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(uperossing)  rate  that  allows  calculation  of  probability  of  exeeedanee  during  a  given  time. 
As  the  EPOT  procedure  is  related  with  statistical  estimates,  confidence  intervals  are 
evaluated  and  earned  out  through  the  entire  procedure  until  the  final  result.  The 
procedure  was  partially  validated  using  wave  elevation  data. 

7.5. F  uture  Work 

The  present  report  describes  statistical  aspeets  of  the  method.  However,  both  the 
threshold  and  the  level  of  interest  may  be  subject  to  additional  limitation  coming  from  the 
dynamieal  aspect  of  the  problem. 

The  core  approach  assumes  that  there  is  available  data  that  carries  information  on 
thcnonlinear  properties  of  dynamical  system,  and  this  data  is  located  above  a  eertain 
threshold.  Setting  minimum  levels  for  the  threshold  cannot  be  performed  based  solely  on 
statistical  data  as  it  requires  knowledge  of  when  the  system  becomes  nonlinear.  As  the 
first  expansion,  for  roll  motion  of  conventional  ship,  this  minimum  threshold  can  be 
taken  from  the  ealm  water  GZ  curve  that  is  the  stiffness  of  dynamieal  system  in  roll.  For 
the  most  conventional  ships,  the  boundary  between  linear  and  nonlinear  is  around  10-12 
degrees. 

Another  limitation  is  related  to  the  maximum  level  of  interest  for  which  the  EPOT 
results  can  still  be  considered  legitimate.  While  there  are  no  statistical  limitations  on  the 
level  of  interest,  physical  characteristics  of  the  dynamical  system  do  change  w  ith  the 
level  The  instantaneous  GZ  curve  that  plays  a  role  of  stiffness  for  roll  motions  must  have 
a  maximum.  For  most  ships,  there  are  three  equilibria  for  roll:  upright  position,  angle  of 
vanishing  stability  and  capsized  position.  Maximum  of  the  GZ  eurve  can  be  considered  a 
boundary  between  the  attraetor  at  the  upright  position  and  repeller  at  the  angle  of 
vanishing  stability.  Therefore,  the  eharaeter  of  nonlinearity  is  quite  different  before  and 
after  the  maximum  angle  of  the  GZ  eurve.  As  the  roll  angles  exceeding  the  maximum  of 
the  GZ  eurve  are  quite  rare,  the  ehanees  are  that  the  rare  problem  will  not  have  enough 
information  on  the  behavior  of  the  system  beyond  the  angle  of  the  maximum.  Therefore 
the  upper  limitation  of  EPOT  may  be  expected  somewhere  around  the  maximum  angle  of 
the  GZ  curve. 

Setting  up  limits  for  the  lowest  threshold  and  the  highest  level  of  interest  requires  a 
formal  procedure  that  still  needs  to  be  developed.  This  procedure  is  likely  to  be  based  on 
dynamieal  eharaeteristie  of  the  ship  rather  than  statistical  data. 

Thus  far,  the  only  validation  whieh  has  been  performed  was  done  on  a  wave  elevation 
dataset.  This  dataset  essentially  represents  the  simplest  linear  system.  Therefore  the  next 
step  in  validation  of  the  procedure  would  be  a  validation  of  a  response  of  a  nonlinear 
dynamieal  system.  This  system  should  be  simple  enough,  so  direct  Monte-Carlo 
simulation  should  be  available  to  generate  enough  data  for  “brute-force”  statistical 
processing  that  will  provide  the  “correct  answer”.  The  subset  of  the  generated  data  should 
be  used  with  EPOT  to  provided  an  extrapolated  result,  whieh  is  expected  to  match  the 
“correct  answer”. 


< 
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